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The invention features a synthetic 
gene encoding a protein normally ex- 
pressed in a mammalian cell or eu- 
karyotic cell wherein at least one non- 
preferred or less preferred codon in the 
natural gene encoding the mammalian 
protein has been replaced by a preferred 
codon encoding the same amino acid. 



1 


CAATTCACCC 


GTAAGCTTGC 


CGCCACCATG 


GTGAGCAAGG 


GCGAGCAGCT 


51 


GTTCACCGGC 


GTGGTGCCCA 


TCCTGGTCGA 


GCTGGACGGC 


GACGTGAACG 


101 


GCCACAAGTT 


CAGCGTGTCC 


GGCGAGGGCG 


AGGGCGATGC 


CACCTACGGC 


151 


AXCCTGXCCC 


TGAAGTTCAT 


CTGCACCACC 


GGCAAGCTGC 


CCGTGCCCTG 


201 


GCCCACCCTC 


GTGACCACCT 


TCAGCTACGG 


CCTGCAGTGC 


TTCAGCCGCT 


251 


ACCCCGACCA 


CATGAAGCAG 


CACGACTTCT 


TCAACTCCGC 


CATGCCCGAA 


301 


GGCTACGTCC 


AGGAGCGCAC 


CATCTTCTTC 


AAGGACGACG 


GCAACTACAA 


351 


GACCCGCGCC 


GAGGTGAAGT 


TCGAGGGCGA 


CACCCTGGTG 


AACCGCATCG 


401 


AGCTGAAGGG 


CATCGACTTC 


AAGGAGGACG 


GCAACATCCT 


GGGGCACAAG 


451 


CTGGAGTACA 


ACTACAACAG 


CCACAACGTC 


TATATCATGG 


CCGACAAGCA 


501 


GAAGAACGGC 


ATCAAGGTGA 


ACTTCAAGAT 


CCGCCACAAC 


ATCGAGGACG 


551 


GCAGCGTGCA 


GCTCGCCGAC 


CACTACCAGC 


AGAACACCCC 


CATCGGCGAC 


601 


GGCCCCGTGC 


TGCTGCCCGA 


CAACCACTAC 


CTGAGCACCC 


AGTCCGCCCT 


651 


GAGCAAAGAC 


CCCAACGAGA 


AGCGCGATCA 


CATGGTCCTG 


CTGGAGTTCG 


701 


TGACCCCCGC 


CGGGATCACT 


CACGGCATGG 


ACGAGCTGTA 


CAAGTAAAGC 


751 


GGCCGCGGAT 


CC (SEQ ID NO: 40) 







3NSDOCID: <WO 971 1086A1_L> 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AM 


Armenia 


AT 


Austria 


AU 


Australia 


BB 


Barbados 


BE 


Belgium 


BF 


Burkina Fas a 


BG 


Bulgaria 


Bj 


Benin 


BR 


Brazil 


BY 


Belarus 


CA 


Canada 


CF 


Central African Republic 


CC 


Congo 


CH 


Switzerland 


CI 


Cote d'lvoire 


CM 


Cameroon 


CN 


China 


CS 


Czechoslovakia 


tCZ 


Czech Republic 


DE 


Germany 


DK 


Denmark 


EE 


Estonia 


ES 


Spain 


FI 


Finland 


FR 


France 


GA 


Gabon 



GB 


United Kingdom 


GE 


Georgia 


GN 


Guinea 


GR 


Greece 


HU 


Hungary 


IE 


Ireland 


IT 


Italy 


JP 


Japan 


KE 


Kenya 


KG 


Kyrgystan 


KP 


Democratic People's Republic 




of Korea 


KR 


Republic of Korea 


KZ 


Kazakhstan 


Li 


Liechtenstein 


LK 


Sri Lanka 


LR 


Liberia 


LT 


Lithuania 


LU 


Luxembourg 


LV 


Latvia 


MC 


Monaco 


MD 


Republic of Moldova 


MG 


Madagascar 


ML 


Mali 


MN 


Mongolia 


MR 


Mauritania 



MW 


Malawi 


MX 


Mexico 


NE 


Niger 


NL 


Netherlands 


NO 


Norway 


NZ 


New Zealand 


PL 


Poland 


PT 


Portugal 


RO 


Romania 


RU 


Russian Federation 


SD 


Sudan 


SE 


Sweden 


SG 


Singapore 


SI 


Slovenia 


SK 


Slovakia 


SN 


Senegal 


sz 


Swaziland 


TO 


Chad 


TG 


Togo 


TJ 


Tajikistan 


TT 


Trinidad and Tobago 


UA 


Ukraine 


UC 


Uganda 


US 


United Stales of America 


uz 


Uzbekistan 


VN 


Viet Nam 



NSDOCID: <WO 971 1086A1_I^> 



WO 97/11086 



PCT/US96/15088 



HIGH LEVEL, EXPRESSION OF PROTEINS 
Field of the Invention 
5 The invention concerns genes and methods for 

expressing eukaryotic and viral proteins at high levels 
in eukaryotic cells. 

Background of the Invention 
Expression of eukaryotic gene products in 
10 prokaryotes is sometimes limited by the presence of 

codons that are infrequently used in E* coli. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons over represented in 
highly expressed prokaryotic genes (Robinson et al. 
15 1984) . It is commonly supposed that rare codons cause 
pausing of the ribosome, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
of transcription and translation. The mRNA 3' end of the 
stalled ribosome is exposed to cellular ribonucleases, 
20 which decreases the stability of the transcript. 

Summary of the Invention 
The invention features a synthetic gene encoding a 
protein normally expressed in a mammalian cell or other 
eukaryotic cell wherein at least one non-preferred or 
25 less preferred codon in the natural gene encoding the 
protein has been replaced by a preferred codon encoding 
the same amino acid. 

Preferred codons are: Ala (gcc) ; Arg (cgc) ; Asn 
(aac) ; Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His 
3 0 (cac) ; lie (ate) ; Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe 
(ttc) ; Ser (age) ; Thr (acc) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg) ; lie (att) ; Leu 
(etc) ; Ser (tec) ; Val (gtc) . All codons which do not fit 
the description of preferred codons or less preferred 
3 5 codons are non-preferred codons. In general, the degree 
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of preference of particular codon is indicated by the 
prevalence of the codon in highly expressed human genes 
as indicated in Table l under the heading "High." For 
example, "ate" represents 77% of the lie codons in highly 
5 expressed mammalian genes and is the preferred lie codon; 
" att" represents 18% of the lie codons in highly 
expressed mammalian genes and is the less preferred lie 
codon. The sequence " ata" represents only 5% of the lie 
codons in highly expressed human genes as is a non- 
10 preferred codon. Replacing a codon with another codon 
that is more prevalent in highly expressed human genes 
will generally increase expression of the gene in 
mammalian cells. Accordingly, the invention includes 
replacing a less preferred codon with a preferred codon 
15 as well as replacing a non-preferred codon with a 
preferred or less preferred codon. 

By "protein normally expressed in a mammalian 
cell" is meant a protein which is expressed in mammalian 
under natural conditions. The term includes genes in the 
20 mammalian genome such as Factor VIII, Factor IX, 

interleukins, and other proteins. The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
25 which are expressed in mammalian cells post-infection. 
By "protein normally expressed in a eukaryotic cell" is 
meant a protein which is expressed in a eukaryote under 
natural conditions. The terra also includes genes which 
are expressed in a mammalian cell under disease 
30 conditions such as 

In preferred embodiments, the synthetic gene is 
capable of expressing the mammalian or eukaryotic protein 
at a level which is at least 110%, 150%, 200%, 500%, 
1,000%, 5,000% or 10,000% of that expressed by said 
35 « natural gene in an in vitro mammalian cell culture system 
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under identical conditions (i.e., same cell type, same 
culture conditions, same expression vector) . 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
5 natural gene are described below. Other suitable 

expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 

10 synthetic and natural genes are described below and in 
the standard reference works described below. By 
"expression" is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 

15 well known to those skilled in the art. By "natural 
gene" is meant the gene sequence (including naturally 
occurring allelic variants) which naturally encodes the 
protein. 

In other preferred embodiments at least 10%, 20%, 
20 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 

In a preferred embodiment the protein is a 
retroviral protein. In a more preferred embodiment the 
protein is a lentiviral protein. In an even more 
25 preferred embodiment the protein is an HIV protein. In 
other preferred embodiments the protein is gag, pol, env, 
gpl20, or gpl60. In other preferred embodiments the 
protein is a human protein. 

The invention also features a method for preparing 
30 a synthetic gene encoding a protein normally expressed by 
a mammalian cell or other eukaryotic cell. The method 
includes identifying non-preferred and less-preferred 
codons in the natural gene encoding the protein and 
replacing one or more of the non-preferred and less- 
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preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 

Under some circumstances (e.g., to permit 
introduction of a restriction site) it may be desirable 
5 to replace a non-preferred codon with a less preferred 
codon rather than a preferred codon. 

It is not necessary to replace all less preferred 
or non-preferred codons with preferred codons. Increased 
expression can be accomplished even with partial 

10 replacement. Under some circumstances it may be 

desirable to only partially replace non-preferred codons 
with preferred or less preferred codons in order to 
obtain an intermediate level of expression. 

In other preferred embodiments the invention 

15 features vectors (including expression vectors) 
comprising one or more the synthetic genes. 

By "vector" is meant a DNA molecule, derived, 
e.g., from a plasmid, bacteriophage, or mammalian or 
insect virus, into which fragments of DNA may be inserted 

20 or cloned. A vector will contain one or more unique 
restriction sites and may be capable of autonomous 
replication in a defined host or vehicle organism such 
that the cloned sequence is reproducible. Thus, by 
"expression vector" is meant any autonomous element 

25 capable of directing the synthesis of a protein* Such 
DNA expression vectors include mammalian plasmids and 
viruses. 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
30 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 
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In constructing the synthetic genes of the 
invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. 

The codon bias present in the HIV gpl20 envelope 
5 gene is also present in the gag and pol proteins. Thus, 
replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 

10 genes encoding Factor VIII and Factor IX are non- 
preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. 

15 The synthetic genes of the invention can be 

introduced into the cells of a living organism. For 
example, vectors (viral or non-viral) can be used to 
introduce a synthetic gene into cells of a living 
organism for gene therapy. 

20 Conversely, it may be desirable to replace 

preferred codons in a naturally occurring gene with less- 
preferred codons as a means of lowering expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson, 

25 J.D. et al., Molecular Biology of the Gene . Volumes I and 
II, the Benjamin/ Cummings Publishing Company, Inc., 
publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Molecular Cell Biology . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W. , et al., 

3 0 Principles of Gene Manipulation: An Introduction to 

Genetic Engineering , 2d edition, University of California 
Press, publisher, Berkeley, CA (1981); Maniatis, T. , et 
al., Molecular Cloning; A Laboratory Manual . 2nd Ed. 
Cold Spring Harbor Laboratory, publisher, Cold Spring 

35 Harbor, NY (1989); and Current Protocols in Molecular 
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pjology, Ausubel et al., Wiley Press, New York, NY 
(1992) . 

Detail ed Description 
Description of the Drawings 
5 Figure 1 depicts the sequence of the synthetic 

gpl2 0 and a synthetic gpl60 gene in which codons have 
been replaced by those found in highly expressed human 
genes. 

Figure 2 is a schematic drawing of the synthetic 
10 gpl20 (HIV-l MN) gene. The shaded portions marked vl to 
v5 indicate hypervariable regions. The filled box 
indicates the CD 4 binding site. A limited number of the 
unique restriction sites ares shown: H (Hind3) , Nh 
(Nhel), P (Pstl), Na (Nael), M (Mlul) , R (EcoRl) , A 
15 (Agel) and No (Notl) . The chemically synthesized DNA 
fragments which served as PCR templates are shown below 
the gpl2 0 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 

2 0 transient transfection assays used to measure gpl20 

expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids 
expressing gpl20 encoded by the IIIB isolate of HIV-l 
(gpl20IIIb), by the MN isolate (gpl20mn) , by the MN 
25 isolate modified by substitution of the endogenous leader 
peptide with that of the CD5 antigen (gpl20mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CD5Leader (syngpl20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hours 

3 0 post-transfection and immunoprecipitated with CD4:IgGl 

fusion protein and protein A sepharose. 

Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
35 t cells transfected with plasmids expressing gp!20 encoded 
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by the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN 
isolate (gpl20mn) , by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CD5 antigen (gp!20mn CD5L) , or by the chemically 
5 synthesized gene encoding the MN variant with human CDS 
leader (syngpl20mn) were harvested after 4 days and 
tested in a gpl2 0/CD4 ELISA. The level of gpl2 0 is 
expressed in ng/ml. 

Figure 5, panel A is a photograph of a gel 

10 illustrating the results of a immunoprecipitation assay 
used to measure expression of the native and synthetic 
gpl2 0 in the presence of rev in trans and the RRE in cis. 
In this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 fig 

15 of plasmid expressing: (A) the synthetic gp!20MN sequence 
and RRE in cis, (B) the gp!20 portion of HIV-1 IIIB, (C) 
the gpl20 portion of HIV-1 IIIB and RRE in cis, all in 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syngpl20mnRRE were generated 

20 using an Eagl/Hpal RRE fragment cloned by PCR from a 

HIV-1 HXB2 proviral clone. Each gpl20 expression plasmid 
was cotransf ected with 10 fig of either pCMVrev or CDM7 
plasmid DNA. Supernatants were harvested 60 hours post 
transf ection, immunoprecipitated with CD4 : IgG fusion 

25 protein and protein A agarose, and run on a 7% reducing 
SDS-PAGE. The gel exposure time was extended to allow the 
induction of gp!20IIIbrre by rev to be demonstrated. 

Figure 5, panel B is a shorter exposure of a 
similar experiment in which syngp!20mnrre was 

30 cotransf ected with or without pCMVrev. Figure 5, panel C 
is a schematic diagram of the constructs used in panel A. 

Figure 6 is a comparison of the sequence of the 
wild-type rat THY— 1 gene (wt) and a synthetic rat THY-1 
gene (env) constructed by chemical synthesis and having 

35 the most prevalent codons found in the HIV-1 env gene. 
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Figure 7 is a schematic diagram of the synthetic 
ratTHY-1 gene. The solid black box denotes the signal 
peptide. The shaded box denotes the sequences in the 
precursor which direct the attachment of a phophatidyl- 
5 inositol glycan anchor. Unique restriction sites used 
for assembly of the THY-1 constructs are marked H 
(Hind3), M (Mlul) , S (Sacl) and No (Notl) . The position 
of the synthetic oligonucleotides employed in the 
construction are shown at the bottom of the figure. 

10 Figure 8 is a graph depicting the results of flow 

cytometry analysis. In this experiment 293T cells 
transiently transfected with either wild-type rat THY-1 
(dark line), ratTHY-1 with envelope codons (light line) 
or vector only (dotted line) . 293T cells were 

15 transfected with the different expression plasmids by 

calcium phosphate coprecipitation and stained with anti- 
ratTHY-1 monoclonal antibody OX7 followed by a polyclonal 
FITC- conjugated anti-mouse IgG antibody 3 days after 
transf ection . 

20 Figure 9, panel A is a photograph of a gel 

illustrating the results of immunoprecipitation analysis 
of supernatants of human 293T cells transfected with 
either syngpl2 0mn (A) or a construct syngpl20mn. rTHY-lenv 
which has the rTHY-lenv gene in the 3' untranslated 

25 region of the syngpl20mn gene (B) . The 

sy ngpl 2 Omn. rTHY-lenv construct was generated by inserting 
a Notl adapter into the blunted Hind3 site of the 
rTHY-lenv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-lenv gene was cloned into the Notl 

30 site of the syngpl20mn plasmid and tested for correct 
orientation. Supernatants of 35 s labeled cells were 
harvested 72 hours post transf ection, precipitated with 
CD4:IgG fusion protein and protein A agarose, and run on 
a 7% reducing SDS-PAGE. 
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Figure 9 , panel B is a schematic diagram of the 
constructs used in the experiment depicted in panel A of 
FIG. 9. 

Figure 10 , panel A is a photograph of COS cells 
5 transfected with vector only showing no GFP fluorescence. 
Figure 10 , panel B is a photograph of COS cells 
transfected with a CDM7 expression plasmid encoding 
native GFP engineered to include a consensus 
translational initiation sequence. Figure 10, panel C is 

10 a photograph of COS cells transfected with an expression 
plasmid having the same flanking sequences and initiation 
consensus as in FIG. 10, panel B f but bearing a codon 
optimized gene sequence. Figure 10, panel D is a 
photograph of COS cells transfected with an expression 

15 plasmid as in FIG. 10, panel C, but bearing a Thr at 
residue 65 in place of Ser. 

Description of the Preferred Embodiments 
Construction of a Synthetic qpl2Q Gene paving qoflpns 
found in Highly Expressed Human Gepes 

20 A codon frequency table for the envelope precursor 

of the LAV subtype of HIV-1 was generated using software 
developed by the University of Wisconsin Genetics 
Computer Group. The results of that tabulation are 
contrasted in Table 1 with the pattern of codon usage by 

25 a collection of highly expressed human genes. For any 

amino acid encoded by degenerate codons, the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 
Moreover a simple rule describes the pattern of favored 

30 envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is 
A is the most frequently used. In the special case of 
3 5 serine, three codons equally contribute one A residue to 
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the mRNA; together these three comprise 85% of the serine 
codons actually used in envelope transcripts. A 
particularly striking example of the A bias is found in 
the codon choice for arginine, in which the AGA triplet 
5 comprises 88% of the arginine codons. in addition to the 
preponderance of A residues, a marked preference is seen 
for uridine among degenerate codons whose third residue 
must be a pyrimidine. Finally, the inconsistencies among 
the less frequently used variants can be accounted for by 
10 the observation that the dinucleotide CpG is under 

represented; thus the third position is less likely to be 
G whenever the second position is c, as in the codons for 
alanine, proline, serine and threonine; and the CGX 
triplets for arginine are hardly used at all. 
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TABLE 1 : Codon Frequency in the HIV-1 II lb env gene and in 

highly expressed human genes. 
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Codon frequency was calculated using the GCG program established the 
University of Wisconsin Genetics Computer Group. Numbers represent 
the percentage of cases in which the particular codon is used. Codon 
usage frequencies of envelope genes of other HIV-1 virus isolates are 
comparable and show a similar bias. 
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In order to produce a gpl2 0 gene capable of high 
level expression in mammalian cells, a synthetic gene 
encoding the gpl2 0 segment of HIV-1 was constructed 
(syngpl2 0mn) , based on the sequence of the most common 
5 North American subtype, HIV-1 MN (Shaw et al., Science 
226:1165, 1984; Gallo et al.. Nature 321:119, 1986). In 
this synthetic gpl20 gene nearly all of the native codons 
have been systematically replaced with codons most 
frequently used in highly expressed human genes (FIG. 1) . 

10 This synthetic gene was assembled from chemically 
synthesized oligonucleotides of 150 to 200 bases in 
length. If oligonucleotides exceeding 12 0 to 150 bases 
are chemically synthesized, the percentage of full-length 
product can be low, and the vast excess of material 

15 consists of shorter oligonucleotides. Since these 

shorter fragments inhibit cloning and PCR procedures, it 
can be very difficult to use oligonucleotides exceeding a 
certain length. In order to use crude synthesis material 
without prior purification, single-stranded 

20 oligonucleotide pools were PCR amplified before cloning. 
PCR products were purified in agarose gels and used as 
templates in the next PCR step. Two adjacent fragments 
could be co-amplified because of overlapping sequences at 
the end of either fragment. These fragments, which were 

2 5 between 3 50 and 4 00 bp in size, were subcloned into a 

pCDM7-derived plasmid containing the leader sequence of 
the CD5 surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. Each of the 
restriction enzymes in this polylinker represents a site 
30 that is present at either the 5' or 3' end of the PCR- 
generated fragments. Thus, by sequential subcloning of 
each of the 4 long fragments, the whole gpl2 0 gene was 
assembled. For each fragment 3 to 6 different clones were 
subcloned and sequenced prior to assembly. A schematic 

3 5 1 drawing of the method used to construct the synthetic 
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gpl2 0 is shown in FIG. 2. The sequence of the synthetic 
gpl2 0 gene (and a synthetic gpl6 0 gene created using the 
same approach) is presented in FIG. 1. 

The mutation rate was considerable. The most 
5 commonly found mutations were short (1 nucleotide) and 
long (up to 30 nucleotides) deletions. In some cases it 
was necessary to exchange parts with either synthetic 
adapters or pieces from other subclones without mutation 
in that particular region. Some deviations from strict 

10 adherence to optimized codon usage were made to 

accommodate the introduction of restriction sites into 
the resulting gene to facilitate the replacement of 
various segments (FIG. 2) . These unique restriction sites 
were introduced into the gene at approximately 100 bp 

15 intervals. The native HIV leader sequence was exchanged 
with the highly efficient leader peptide of the human CDS 
antigen to facilitate secretion (Aruffo et al., 
Ceil. 61: 1303, 1990) The plasmid used for construction is 
a derivative of the mammalian expression vector pCDM7 

20 transcribing the inserted gene under the control of a 
strong human CMV immediate early promoter. 

To compare the wild-type and synthetic gpl20 
coding sequences, the synthetic gp!20 coding sequence was 
inserted into a mammalian expression vector and tested in 

25 transient transfection assays. Several different native 
gpl2 0 genes were used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gpl2 0 HIV I I lb construct used as control was generated by 

30 PCR using a Sall/Xhol HIV-l HXB2 envelope fragment as 

template. To exclude PCR induced mutations, a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with the respective sequence from the proviral 
clone. The wild-type gpl20mn constructs used as controls 

35* were cloned by PCR from HIV-l MN infected C8166 cells 



BNSDOCID: <WO 9711086A1 J_> 



WO 97/1 1 086 PCTYUS96/ 1 5088 

- 14 - 

(AIDS Repository, Rockville, MD) and expressed gpl20 
either with a native envelope or a CDS leader sequence. 
Since proviral clones were not available in this case, 
two clones of each construct were tested to avoid PCR 
5 artifacts. To determine the amount of secreted gpl2 0 
semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate co- 
precipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 

10 sepharose. 

The results of this analysis (FIG. 3) show that 
the synthetic gene product is expressed at a very high 
level compared to that of the native gpl2 0 controls. The 
molecular weight of the synthetic gpl2 0 gene was 

15 comparable to control proteins (FIG. 3) and appeared to 
be in the range of 100 to 110 kd. The slightly faster 
migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylation is either not complete 
or altered to some extent. 

20 To compare expression more accurately gpl20 

protein levels were quantitated using a gpl20 ELISA with 
CD4 in the demobilized phase. This analysis shows (FIG. 
4) that ELISA data were comparable to the 
iromunoprecipitation data, with a gpl20 concentration of 

25 approximately 12 5 ng/ml for the synthetic gpl2 0 gene, and 
less than the background cutoff (5 ng/ml) for all the 
native gpl2 0 genes. Thus, expression of the synthetic 
gpl2 0 gene appears to be at least one order of magnitude 
higher than wild-type gpl20 genes. In the experiment 

3 0 shown the increase was at least 25 fold. 
The Role of rev in gpl20 Expression 

Since rev appears to exert its effect at several 
steps in the expression of a viral transcript, the 
possible role of non-translational effects in the 

35 ft improved expression of the synthetic gpl20 gene was 
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tested. First, to rule out the possibility that negative 
signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by 
changing the nucleotide sequence, cytoplasmic mRNA levels 
5 were tested. Cytoplasmic RNA was prepared by NP4 0 lysis 
of transiently transfected 293T cells and subsequent 
elimination of the nuclei by centrif ugation. Cytoplasmic 
RNA was subsequently prepared from lysates by multiple 
phenol extractions and precipitation, spotted on 

10 nitrocellulose using a slot blot apparatus, and finally 
hybridized with an envelope-specific probe. 

Briefly, cytoplasmic mRNA 293 cells transfected 
with CDM& , gpl20 IIIB, or syngpl20 was isolated 36 hours 
post transf ection. Cytoplasmic RNA of Hela cells 

15 infected with wild- type vaccinia virus or recombinant 
virus expressing gpl20 Illb or the synthetic gpl2 0 gene 
was under the control of the 7.5 promoter was isolated 16 
hours post infection. Equal amounts were spotted on 
nitrocellulose using a slot blot device and hybridized 

2 0 with randomly labeled 1.5 kb gp!2 0IIIb and syngpl2 0 

fragments or human beta-actin. RNA expression levels 
were quantitated by scanning the hybridized membranes 
with a phospoimager . The procedures used are described 
in greater detail below. 
25 This experiment demonstrated that there was no 

significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl2 0 
gene. In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gp!20 gene was even lower than 

3 0 that of the native gpl20 gene. 

These data were confirmed by measuring expression 
from recombinant vaccinia viruses. Human 293 cells or 
Hela cells were infected with vaccinia virus expressing 
wild-type gpl2 0 Illb or syngpl20mn at a multiplicity of 
35 infection of at least 10. Supernatants were harvested 24 
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hours post infection and immunoprecipitated with 
CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 
5 This experiment showed that the increased 

expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 7.5k 

10 promoter. Because vaccinia virus mRNAs are transcribed 
and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 

15 types, the kidney cancer cell line 293 and HeLa cells. 
As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus . 

Codon Us age in Lentivirus 

20 Because it appears that codon usage has a 

significant impact on expression in mammalian cells, the 
codon frequency in the envelope genes of other 
retroviruses was examined. This study found no clear 
pattern of codon preference between retroviruses in 

25 general. However, if viruses from the lentivirus genus, 
to which HIV-l belongs to, were analyzed separately, 
codon usage bias almost identical to that of HIV-l was 
found. A codon frequency table from the envelope 
glycoproteins of a variety of (predominantly type C) 

3 0 retroviruses excluding the lentiviruses was prepared, and 
compared a codon frequency table created from the 
envelope sequences of four lentiviruses not closely 
related to HIV-l (caprine arthritis encephalitis virus, 
equine infectious anemia virus, feline immunodeficiency 

35 ( virus, and visna virus) (Table 2) . The codon usage 
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pattern for lentiviruses is strikingly similar to that of 
HIV-1, in all cases but one, the preferred codon for 
HIV-1 is the same as the preferred codon for the other 
lentiviruses. The exception is proline, which is encoded 
5 by CCT in 41% of non-HIV lentiviral envelope residues, 
and by CCA in 4 0% of residues, a situation which clearly 
also reflects a significant preference for the triplet 
ending in A. The pattern of codon usage by the non- 
lentiviral envelope proteins does not show a similar 

10 predominance of A residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes. In general 
non-lentiviral retroviruses appear to exploit the 
different codons more equally, a pattern they share with 

15 less highly expressed human genes. 
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TABLE 2: 
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the University of Wisconsin Genetics Computer Group. Numbers 
represent the percentage in which a particular codon is used. Codon 
usage of non-lentiviral retroviruses was compiled from the envelope 
precursor sequences of bovine leukemia virus feline leukemia virus, 
human T-cell leukemia virus type I , human T-cell lymphotropic virus 
type II, the mink cell focus-forming isolate of murine leukemia virus 
JMuLV), the Rauscher spleen focus-forming isolate, the 10A1 isolate, 
the 4070A amphotropic isolate and the myeloproliferative leukemia 
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virus isolate, and from rat leukemia virus, simian sarcoma virus, 
simian T-cell leukemia virus, leukemogenic retrovirus T1223/B and 
gibbon ape leukemia virus. The codon frequency tables for the non- 
HIV, non-SIV lentiviruses were compiled from the envelope precursor 
sequences for caprine arthritis encephalitis virus, equine infectious 
anemia virus, feline immunodeficiency virus, and visna virus. 



In addition to the prevalence of A containing 
codons, lentiviral codons adhere to the HIV pattern of 
strong CpG under representation, so that the third 
position for alanine, proline, serine and threonine 
5 triplets is rarely G. The retroviral envelope triplets 
show a similar, but less pronounced, under representation 
of CpG. The most obvious difference between lentiviruses 
and other retroviruses with respect to CpG prevalence 
lies in the usage of the CGX variant of arginine 
10 triplets, which is reasonably frequently represented 
among the retroviral envelope coding sequences, but is 
almost never present among the comparable lentivirus 
sequences . 

Differences in rev Dependence Between Native and 
15 Synthetic gpl2 0 

To examine whether regulation by rev is connected 

to HIV-1 codon usage, the influence of rev on the 

expression of both native and synthetic gene was 

investigated. Since regulation by rev requires the rev- 

2 0 binding site RRE in cis, constructs were made in which 

this binding site was cloned into the 3' untranslated 

region of both the native and the synthetic gene. These 

plasmids were co-transfected with rev or a control 

plasmid in trans into 293T cells, and gpl20 expression 

2 5 levels in supernatants were measured semiquantitatively 

by immunoprecipitation. The procedures used in this 
experiment are described in greater detail below. 

As shown in FIG. 5, panel A and FIG. 5, panel B, 
rev up regulates the native gpl2 0 gene, but has no effect 

3 0 on the expression of the synthetic gpl20 gene. Thus, the 
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action of rev is not apparent on a substrate which lacks 
the coding sequence of endogenous viral envelope 
sequences . 

Expression of a synthetic rat THY— 1 gene with HIV 
5 envelope 
codons 

The above-described experiment suggest that in 
fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 

10 synthetic version of the gene encoding the small, 

typically highly expressed cell surface protein, rat 
THY-1 antigen, was prepared. The synthetic version of 
the rat THY-1 gene was designed to have a codon usage 
like that of HIV gpl2 0. In designing this synthetic gene 

15 AUUUA sequences, which are associated with mRNA 

instability, were avoided. In addition, two restriction 
sites were introduced to simplify manipulation of the 
resulting gene (FIG 6). This synthetic gene with the HIV 
envelope codon usage (rTHY-lenv) was generated using 

20 three 150 to 170 mer oligonucleotides (FIG. 7). In 
contrast to the syngpl2 0mn gene, PCR products were 
directly cloned and assembled in pUC12, and subsequently 
cloned into pCDM7 . 

Expression levels of native rTHY-1 and rTHY-1 with 

25 the HIV envelope codons were quant itated by 

immunofluorescence of transiently transfected 293T cells. 
FIG 8 shows that the expression of the native THY-1 gene 
is almost two orders of magnitude above the background 
level of the control transfected cells (pCDM7) . In 

3 0 contrast, expression of the synthetic rat THY-1 is 

substantially lower than that of the native gene (shown 
by the shift to of the peak towards a lower channel 
number) . 

To prove that no negative sequence elements 
3 5 promoting mRNA degradation were inadvertently introduced, 
t a construct was generated in which the rTHY-lenv gene was 
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cloned at: the 3' end of the synthetic gpl20 gene (FIG. 9, 
panel B) . In this experiment 293T cells were transfected 
with either the syngpl2 0mn gene or the syngpl2 0/rat THY-1 
env fusion gene (syngpl20mn. rTHY-lenv) . Expression was 
5 measured by immunoprecipitation with CD4 : IgG fusion 

protein and protein A agarose. The procedures used in 
this experiment are described in greater detail below. 

Since the synthetic gpl20 gene has an UAG stop 
codon, rTHY-lenv is not translated from this transcript. 

10 If negative elements conferring enhanced degradation were 
present in the sequence, gpl20 protein levels expressed 
from this construct should be decreased in comparison to 
the syngpl20mn construct without rTHY-lenv. FIG. 9, 
panel A, shows that the expression of both constructs is 

15 similar, indicating that the low expression must be 

linked to translation. 

Rev-dependent expression of synthetic rat THY— 1 
gene with envelope codons 

To explore whether rev is able to regulate 

2 0 expression of a rat THY— 1 gene having env codons, a 

construct was made with a rev-binding site in the 3' end 
of the rTHYlenv open reading frame. To measure rev- 
responsiveness of the a rat THY-lenv construct having a 
3' RRE, human 293T cells were cotransf ected 

25 ratTHY-lenvrre and either CDM7 or pCMVrev. At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
PBS and stained with the OX-7 anti rTHY-1 mouse 
monoclonal antibody and a secondary FITC-con jugated 
antibody. Fluorescence intensity was measured using a 

30 EPICS XL cytof luorometer . These procedures are described 
in greater detail below. 

In repeated experiments, a slight increase of 
rTHY-lenv expression was detected if rev was 
cotransf ected with the rTHY-lenv gene. To further 

3'5 increase the sensitivity of the assay system a construct 
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expressing a secreted version of rTHY-lenv was generated. 
This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
5 over an extended period, in contrast to surface expressed 
protein, which appears to more closely reflect the 
current production rate. A gene capable of expressing a 
secreted form was prepared by PCR using forward and 
reverse primers annealing 3' of the endogenous leader 

10 sequence and 5' of the sequence motif required for 

phosphatidylinositol glycan anchorage respectively. The 
PCR product was cloned into a plasmid which already 
contained a CD5 leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 

15 and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 

The rev-responsiveness of the secreted form 
ratTHY-lenv was measured by immunoprecipitation of 
supernatants of human 293T cells cotransf ected with a 

2 0 plasmid expressing a secreted form of ratTHY-lenv and the 

RRE sequence in cis (rTHY-lenvPl-rre) and either CDM7 or 
pCMVrev. The r TH Y - 1 e nvP I —RRE construct was made by PCR 
using the oligonucleotide: 

cgcggggctagcgcaaagagtaataagtttaac (SEQ ID NO: 38) as a 
25 forward primer, the oligonucleotide: 

cgcggatcccttgtattttgtactaata (SEQ ID NO: 39) as reverse 
primer, and the synthetic rTHY-lenv construct as 
template. After digestion with Nhel and Notl the PCR 
fragment was cloned into a plasmid containing CD5 leader 

3 0 and RRE sequences. Supernatants of 35 S labeled cells 

were harvested 72 hours post transf ection, precipitated 
with a mouse monoclonal antibody OX7 against rTHY-l and 
anti mouse IgG sepharose, and run on a 12% reducing SDS- 
PAGE. 
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In this experiment the induction of rTHY-lenv by 

rev was much more prominent and clear-cut than in the 

above-described experiment and strongly suggests that rev 

is able to translationally regulate transcripts that are 

5 suppressed by low-usage codons. 

Rev- independent expression of a rTHY-lenv; immunoglobulin 
fusion protein 

To test whether low-usage codons must be present 

throughout the whole coding sequence or whether a short 

10 region is sufficient to confer rev-responsiveness, a 
rTHY-lenv: immunoglobulin fusion protein was generated. 
In this construct the rTHY-lenv gene (without the 
sequence motif responsible for phosphatidylinositol 
glycan anchorage) is linked to the human IgGl hinge , CH2 

15 and CH3 domains. This construct was generated by anchor 
PCR using primers with Nhel and BamHI restriction sites 
and rTHY-lenv as template. The PCR fragment was cloned 
into a plasmid containing the leader sequence of the CD5 
surface molecule and the hinge , CH2 and CH3 parts of 

20 human IgGl immunoglobulin. A Hind3/Eagl fragment 
containing the rTHY-lenvegl insert was subsequently 
cloned into a pCDM7 -derived plasmid with the RRE 
sequence . 

To measure the response of the rTHY-lenv/ 
25 immunoglobin fusion gene (rTHY-lenveglrre) to rev human 
293T cells cotransf ected with rTHY-lenveglrre and either 
pCDM7 or pCMVrev. The rTHY-lenveglrre construct was made 
by anchor PCR using forward and reverse primers with Nhel 
and BamHI restriction sites respectively. The PCR 
3 0 fragment was cloned into a plasmid containing a CDS 
leader and human IgGl hinge, CH2 and CH3 domains. 
Supernatants of 35 S labeled cells were harvested 72 hours 
post transf ection, precipitated with a mouse monoclonal 
antibody OX7 against rTHY-1 and anti mouse IgG sepharose, 
3 5 and run on a 12% reducing SDS-PAGE. The procedures used 
are described in greater detail below. 
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As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/ immunoglobulin fusion protein is secreted into 
the supernatant. Thus, this gene should be responsive to 
rev- induct ion. However, in contrast to rTHY-lenvPI- , 
5 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-l: immunoglobulin fusion 
protein with native rTHY-l or HIV envelope codons was 
measured by immunoprecipitation . Briefly, human 293T 

10 cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons) . The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY-l gene was used as 

15 template. Supernatants of 35 S labeled cells were 

harvested 7 2 hours post transf ection, precipitated with a 
mouse monoclonal antibody OX7 against rTHY-l and anti 
mouse IgG sepharose, and run on a 12% reducing SDS-PAGE. 
THE procedures used in this experiment are described in 

20 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wild-type 
rTHY-l as the fusion partner, but were still considerably 
higher than rTHY-lenv. Accordingly, both parts of the 

25 fusion protein influenced expression levels. The 

addition of rTHY-lenv did not restrict expression to an 
equal level as seen for rTHY-lenv alone. Thus, 
regulation by rev appears to be ineffective if protein 
expression is not almost completely suppressed. 

30 Codon preferen ce in HIV-1 envelope genes 

Direct comparison between codon usage frequency of 
HIV envelope and highly expressed human genes reveals a 
striking difference for all twenty amino acids. One 
simple measure of the statistical significance of this 

35* codon preference is the finding that among the nine amino 
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acids with two fold codon degeneracy, the favored third 
residue is A or U in all nine. The probability that all 
nine of two equiprobable choices will be the same is 
approximately 0.004, and hence by any conventional 
5 measure the third residue choice cannot be considered 

random. Further evidence of a skewed codon preference is 
found among the more degenerate codons, where a strong 
selection for triplets bearing adenine can be seen. This 
contrasts with the pattern for highly expressed genes, 
10 which favor codons bearing C, or less commonly G, in the 
third position of codons with three or more fold 
degeneracy. 

The systematic exchange of native codons with 
codons of highly expressed human genes dramatically 

15 increased expression of gpl20. A quantitative analysis 
by ELISA showed that expression of the synthetic gene was 
at least 25 fold higher in comparison to native gp!20 
after transient transfection into human 293 cells. The 
concentration levels in the ELISA experiment shown were 

20 rather low. Since an ELISA was used for quantification 
which is based on gpl20 binding to CD4, only native, non- 
denatured material was detected. This may explain the 
apparent low expression. Measurement of cytoplasmic mRNA 
levels demonstrated that the difference in protein 

25 expression is due to translational differences and not 
mRNA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
family was divided into two subgroups, lentiviruses and 

30 non-lentiviral retroviruses, a similar preference to A 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus, the availing evidence 
suggests that lentiviruses retain a characteristic 
pattern of envelope codons not because of an inherent 

35 advantage to the reverse transcription or replication of 
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such residues, but rather for some reason peculiar to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 
retroviruses are additional regulatory and non- 
5 essentially accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 
molecules is based on it. in fact, it is known that one 

10 of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 
hypothesis was proved using a similar strategy as above, 

15 but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 
considerably lower in comparison to the native molecule, 

2 0 almost two orders of magnitude when analyzed by 

immunofluorescence of the surface expressed molecule (see 
4.7). if rev was coexpressed in trans and a RRE element 
was present in cis only a slight induction was found for 
the surface molecule. However, if THY-1 .was expressed as 

25 a secreted molecule, the induction by rev was much more 
prominent, supporting the above hypothesis. This can 
probably be explained by accumulation of secreted protein 
in the supernatant, which considerably amplifies the rev 
effect. If rev only induces a minor increase for surface 

30 molecules in general, induction of HIV envelope by rev 
cannot have the purpose of an increased surface 
abundance , but rather of an increased intracellular gpl60 
level. it is completely unclear at the moment why this 
should be the case. 
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To test whether small subtotal elements of a gene 
are sufficient to restrict expression and render it rev- 
dependent rTHYlenv: immunoglobulin fusion proteins were 
generated, in which only about one third of the total 
5 gene had the envelope codon usage. Expression levels of 
this construct were on an intermediate level, indicating 
that the rTHY-lenv negative sequence element is not 
dominant over the immunoglobulin part. This fusion 
protein was not or only slightly rev-responsive, 

10 indicating that only genes almost completely suppressed 
can be rev-responsive* 

Another characteristic feature that was found in 
the codon frequency tables is a striking under 
representation of CpG triplets. In a comparative study 

15 of codon usage in E. coli, yeast, drosophila and primates 
it was shown that in a high number of analyzed primate 
genes the 8 least used codons contain all codons with the 
CpG dinucleotide sequence. Avoidance of codons 
containing this dinucleotide motif was also found in the 

20 sequence of other retroviruses. It seems plausible that 
the reason for under representation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of CpG cytosines. The expected 
number of CpG dinucleotides for HIV as a whole is about 

25 one fifth that expected on the basis of the base 

composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis . 

3 0 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However, other factors may 
play a role. First, abundance of not maximally loaded 

3$ mRNA's in eukaryotic cells indicates that initiation is 
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rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 
5 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 

10 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 

15 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 

2 0 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev-responsive transcripts, it could be that RNA 
adenine methylation mediates this translational 
suppression. 

25 Detailed Procedures 

The following procedures were used in the above- 
described experiments. 
Sequence Analysis 

Sequence analyses employed the software developed 

30 by the University of Wisconsin Computer Group. 
Plasmid constructions 

Plasmid constructions employed the following 
methods. Vectors and insert DNA was digested at a 
concentration of 0.5 fMg/10 pi in the appropriate 

35 .restriction buffer for 1-4 hours (total reaction volume 



3NSDOCID: <WO_ 



_9711086A1_I_> 



WO 97/11086 



PCTAJS96/15088 



- 29 - 

approximately 3 0 fil) . Digested vector was treated with 
10% (v/v) of 1 /xg/ml calf intestine alkaline phosphatase 
for 3 0 min prior to gel electrophoresis- Both vector and 
insert digests (5 to 10 /xl each) were run on a 1.5% low 
5 melting agarose gel with TAE buffer. Gel slices 

containing bands of interest were transferred into a 1.5 
ml reaction tube, melted at 65 °C and directly added to 
the ligation without removal of the agarose. Ligations 
were typically done in a total volume of 25 fil in lx Low 
10 Buffer lx Ligation Additions with 200-400 U of ligase, 1 
/xl of vector, and 4 £il of insert. When necessary, 5' 
overhanging ends were filled by adding 1/10 volume of 250 
fiH dNTPs and 2-5 U of Klenow polymerase to heat 
inactivated or phenol extracted digests and incubating 
15 for approximately 20 min at room temperature. When 

necessary, 3' overhanging ends were filled by adding 1/10 
volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase to 
heat inactivated or phenol extracted digests, followed by 
incubation at 37 °C for 30 min. The following buffers 
2 0 were used in these reactions: lOx Low buffer (60 mM Tris 
HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM 
/3-mercaptoethanol, 0.02% NaN 3 ) ; lOx Medium buffer (60 mM 
Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 
70 mM /3-mercaptoethanol, 0.02% NaN 3 ) ; lOx High buffer (60 
25 mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml 
BSA, 70 mM /3-mercaptoethanol , 0.02% NaN 3 ) ; lOx Ligation 
additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
Oligonucleotide synthesis and purification 
30 Oligonucleotides were produced on a Milligen 8750 

synthesizer (Millipore) . The columns were eluted with 1 
ml of 30% ammonium hydroxide, and the eluted 
oligonucleotides were deblocked at 55 °C for 6 to 12 
hours. After deblockiong, 150 fil of oligonucleotide were 
3*5 precipitated with lOx volume of unsaturated n-butanol in 
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1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 /xl of H 2 0. The 
concentration was determined by measuring the optical 
5 density at 260 nm in a dilution of 1:333 (l OD 260 = 30 
Mg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
shown in this text are in 5' to 3' direction). 
10 oligo 1 forward (Nhel) : cgc ggg eta gec acc gag 

aag ctg (SEQ ID NO:l). 

oligo 1: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 
gec age gac gec aag gcg tac gac acc gag gtg cac aac gtg 
15 tgg gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aac gtg acc gag aac ttc aac at (SEQ 
ID NO:2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 

tga agt tct c (SEQ ID NO:3). 
20 oligo 2 forward: gac cga gaa ctt caa cat gtg gaa 

gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 

gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 

gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc acc gac ctg 
25 agg aac acc acc aac acc aac ac age acc gee aac aac aac 

age aac age gag ggc acc ate aag ggc ggc gag atg (SEQ ID 

NO:5) . 

oligo 2 reverse (Pstl) : gtt gaa get gca gtt ctt 
cat etc gee gee ctt (SEQ ID NO: 6). 
30 oligo 3 forward (Pstl) : gaa gaa ctg cag ctt caa 

cat cac cac cag c (SEQ ID NO: 7). 

oligo 3: aac ate acc acc age ate cgc gac aag atg 
cag aag gag tac gec ctg ctg tac aag ctg gat ate gtg age 
ate gac aac gac age acc age tac cgc ctg ate tec tgc aac 
3 5 t acc age gtg ate acc cag gee tgc ccc aag ate age ttc gag 
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ccc ate ccc ate cac tac tgc gec ccc gec ggc ttc gec (SEQ 
ID NO: 8) . 

oligo 3 reverse: gaa ctt ctt gtc ggc ggc gaa gec 
ggc ggg (SEQ ID NO: 9). 
5 oligo 4 forward: gcg ccc ccg ccg get teg cca tec 

tga agt gca acg aca aga agt tc (SEQ ID NO: 10) 

oligo 4: gec gac aag aag ttc age ggc aag ggc age 
tgc aag aac gtg age ace gtg cag tgc ace cac ggc ate egg 
ccg gtg gtg age acc cag etc ctg ctg aac ggc age ctg 

10 gee gag gag gag gtg gtg ate cgc age gag aac ttc acc gac 
aac gee aag acc ate ate gtg cac ctg aat gag age gtg cag 
ate (SEQ ID NO: 11) 

oligo 4 reverse (Mlul) : agt tgg gac gcg tgc agt 
tga tct gca cgc tct c (SEQ ID NO: 12). 
15 oligo 5 forward (Mlul) : gag age gtg cag ate aac 

tgc acg cgt ccc (SEQ ID NO:13). 

oligo 5: aac tgc acg cgt ccc aac tac aac aag cgc 
aag cgc ate cac ate ggc ccc ggg cgc gee ttc tac acc acc 
aag aac ate ate ggc acc ate etc cag gee cac tgc aac ate 

2 0 tct aga (SEQ ID NO: 14) . 

oligo 5 reverse: gtc gtt cca ctt ggc tct aga gat 
gtt gca (SEQ ID NO: 15). 

oligo 6 forward: gca aca tct eta gag cca agt gga 
acg ac (SEQ ID NO: 16). 
25 oligo 6: gec aag tgg aac gac acc ctg cgc cag ate 

gtg age aag ctg aag gag cag ttc aag aac aag acc ate gtg 
ttc ac cag age age ggc ggc gac ccc gag ate gtg atg 

cac age ttc aac tgc ggc ggc (SEQ ID NO: 17). 

oligo 6 reverse (EcoRl) : gca gta gaa gaa ttc gec 

3 0 gee gca gtt ga (SEQ ID NO: 18). 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 
tct tct act gc (SEQ ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
ctg ttc aac age acc tgg aac ggc aac aac acc tgg aac aac 
3^ acc acc ggc age aac aac aat att acc etc cag tgc aag ate 
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aag cag ate ate aac atg tgg cag gag gtg ggc aag gec atg 
tac gec ccc ccc ate gag ggc cag ate egg tgc age age (SEQ 
ID NO:20) 

oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 
5 ace gga tct ggc cct c (SEQ ID NO:21). 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag 
caa cat cac egg tct g (SEQ ID NO: 22). 

oligo 8: aac ate ace ggt ctg ctg ctg acc cgc gac 
ggc ggc aag gac acc gac acc aac gac acc gaa ate ttc cgc 
10 ccc ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg 
tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gee 
ccc acc aag gee aag cgc cgc gtg gtg cag cgc gag aag cgc 
(SEQ ID NO: 23) . 

oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag 
15 cgc ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/Hind3) : cgc ggg gga tec 
aag ctt acc atg att cca gta ata agt (SEQ ID NO: 25). 
20 oligo 1: atg aat cca gta ata agt ata aca tta tta 

tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat ttg cca ata caa cat gaa ttt 
tea tta acg (SEQ ID NO:26) . 
25 oligo 1 reverse (EcoRl/Mlul) : cgc ggg gaa ttc acg 

cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27). 

oligo 2 forward (BamHl/Mlul) : cgc gga tec acg cgt 
gaa aaa aaa aaa cat (SEQ ID NO: 28). 

oligo 2: cgt gaa aaa aaa aaa cat gta tta agt gga 
3 0 aca tta gga gta cca gaa cat aca tat aga agt aga gta aat 
ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID 
NO:29) . 

oligo 2 reverse (EcoRl/Sacl) : cgc gaa ttc gag etc 
35 „aca cat ata ate tec (SEQ ID NO:30). 
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oligo 3 forward (BamHl/Sacl) : cgc gga tec gag etc 
aga gta agt gga caa (SEQ ID NO: 31). 

oligo 3: etc aga gta agt gga caa aat cca aca agt 
agt aat aaa aca ata aat gta ata aga gat aaa tta gta aaa 
5 tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 
tta tta tta tta tta agt tta agt ttt tta caa gca aca gat 
ttt ata agt tta tga (SEQ ID NO: 32). 

oligo 3 reverse (EcoRl/Notl) : cgc gaa ttc gcg gec 
get tea taa act tat aaa ate (SEQ ID NO: 33). 
10 Polymerase Chain Reaction 

Short, overlapping 15 to 2 5 mer oligonucleotides 
annealing at both ends were used to amplify the long 
oligonuclotides by polymerase chain reaction (PCR) . 
Typical PCR conditions were: 3 5 cycles, 55 °C annealing 
15 temperature, 0.2 sec extension time. PCR products were 
gel purified, phenol extracted, and used in a subsequent 
PCR to generate longer fragments consisting of two 
adjacent small fragments. These longer fragments were 
cloned into a CDM7-derived plasmid containing a leader 

2 0 sequence of the CD5 surface molecule followed by a 

Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

The following solutions were used in these 
reactions: lOx PCR buffer (50O mM KC1, 100 mM Tris HC1, 
pH 7.5, 8 mM MgCl 2 , 2 mM each dNTP) . The final buffer 
25 was complemented with 10% DMSO to increase fidelity of 
the Taq polymerase. 
Small scale DNA preparation 

Transformed bacteria were grown in 3 ml LB 
cultures for more than 6 hours or overnight. 

3 0 Approximately 1.5 ml of each culture was poured into 1.5 

ml microfuge tubes, spun for 2 0 seconds to pellet cells 
and resuspended in 200 fil of solution I. Subsequently 
400 /xl of solution II and 300 fil of solution III were 
added. The microfuge tubes were capped, mixed and spun 
35 for > 30 sec. Supernatants were transferred into fresh 
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tubes and phenol extracted once. DNA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 
70 % ethanol and resuspended in 50 ^tl dH20 containing 10 
5 pi of RNAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % Nad, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 
(2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 

10 adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HC1, 
pH 7.5, 1 mM EDTA pH 8.0). 
Large scale DNA preparation 

One liter cultures of transformed bacteria were 
grown 24 to 3 6 hours (MC1061p3 transformed with pCDM 

15 derivatives) or 12 to 16 hours (MC1061 transformed with 
puc derivatives) at 37»c in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 
centrifuge at 4,200 rpm for 20 min. The pellet was 

20 resuspended in 40 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
the bottles were shaken semivigorously until lumps of 2 
to 3 mm size developed. The bottle was spun at 4,200 rpm 
for 5 min and the supernatant was poured through 

25 cheesecloth into a 250 ml bottle. Isopropanol was added 
to the top and the bottle was spun at 4,200 rpm for 10 
min. The pellet was resuspended in 4.1 ml of solution I 
and added to 4.5 g of cesium chloride, 0.3 ml of 10 mg/ml 
ethidium bromide, and 0.1 ml of 1% Triton X100 solution. 

3 0 The tubes were spun in a Beckman J2 high speed centrifuge 
at 10,000 rpm for 5 min. The supernatant was transferred 
into Beckman Quick Seal ultracentrif uge tubes, which were 
then sealed and spun in a Beckman ultracentrif uge using a 
NVT90 fixed angle rotor at 80,000 rpm for > 2.5 hours. 

3 5 ,The band was extracted by visible light using a 1 ml 
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syringe and 2 0 gauge needle. An equal volume of dH 2 0 was 
added to the extracted material, DNA was extracted once 
with n-butanol saturated with 1 M sodium chloride, 
followed by addition of an equal volume of 10 M ammonium 
5 acetate/ 1 mM EDTA . The material was poured into a 13 ml 
snap tube which was tehn filled to the top with absolute 
ethanol, mixed, and spun in a Beckman J2 centrifuge at 
10,000 rpm for 10 min. The pellet was rinsed with 70% 
ethanol and resuspended in 0.5 to 1 ml of H 2 0. The DNA 
10 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:200 (1 OD 260 = 50 
pg/ml) . 

The following media and buffers were used in these 
procedures: M9 bacterial medium (10 g M9 salts, 10 g 

15 casamino acids (hydrolyzed) , 10 ml M9 additions, 7.5 

/xg/ml tetracycline (500 /xl of a 15 mg/ml stock solution) , 
12.5 /xg/ml ampicillin (125 /xl of a 10 mg/ml stock 
solution); M9 additions (10 mM CaCl 2 , 100 mM MgS0 4 , 200 
/xg/ml thiamine, 70% glycerol); LB medium (1.0 % NaCl, 0.5 

20 % yeast extract, 1.0 % trypton) ; Solution I (10 mM EDTA 
pH 8.0); Solution II (0.2 M NaOH 1.0 % SDS) ; Solution III 
(2.5 M KOAC 2.5 M HOAc) 
Sequencing 

Synthetic genes were sequenced by the Sanger 
25 dideoxynucleotide method. In brief, 2 0 to 50 /xg double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 
min. Subsequently the DNA was precipitated with 1/10 
volume of sodium acetate (pH 5.2) and 2 volumes of 
ethanol and centrifuged for 5 min. The pellet was washed 
30 with 70% ethanol and resuspended at a concentration of 1 
/xg//xl. The annealing reaction was carried out with 4 fig 
of template DNA and 4 0 ng of primer in Ix annealing 
buffer in a final volume of 10 pi. The reaction was 
heated to 65°C and slowly cooled to 37°C In a separate 
35" tube 1 /xl of 0.1 M DTT, 2 pi of labeling mix, 0.75 /xl of 
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dH 2 0, 1 Ml of [ 35 S] dATP (10 uCi) , and 0.25 pi of 
Sequenase- (12 U/^l) were added for each reaction. Five 
Ml of this mix were added to each annealed primer- 
template tube and incubated for 5 min at room 
5 temperature. For each labeling reaction 2.5 nl of each 
of the 4 termination mixes were added on a Terasaki plate 
and prewarmed at 37°c. At the end of the incubation 
period 3.5 fil of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 ftl of stop 

10 solution were added to each reaction and the Terasaki 
plate was incubated at 80 °C for 10 min in an oven. The 
sequencing reactions were run on 5% denaturing 
polyacrylamide gel. An acrylamide solution was prepared 
by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 

15 100 g of acrylamide : bisacrylamide (29:1). 5% 

polyacrylamide 4 6% urea and lx TBE gel was prepared by 
combining 38 ml of acrylamide solution and 28 g urea. 
Polymerization was initiated by the addition of 4 00 /tl of 
10% ammonium peroxodisulf ate and 60 fil of TEMED. Gels 

2 0 were poured using silanized glass plates and sharktooth 
combs and run in lx TBE buffer at 60 to 100 W for 2 to 4 
hours (depending on the region to be read) . Gels were 
transferred to Whatman blotting paper, dried at 80»C for 
about 1 hour, and exposed to x-ray film at room 

25 temperature. Typically exposure time was 12 hours. The 
following solutions were used in these procedures: 5x 
Annealing buffer (200 mM Tris HC1, pH 7.5, 100 mM MgCl 2 , 
250 mM NaCl); Labelling Mix (7.5 fiM each dCTP, dGTP, and 
dTTP); Termination Mixes (80 each dNTP, 50 mM NaCl, 8 

30 mm ddNTP (one each)); Stop solution (95% formamide, 20 mM 
EDTA, 0.05 % bromphenol blue, 0.05 % xylencyanol) ; 5x TBE 
(0.9 M Tris borate, 2 0 mM EDTA); Polyacrylamide solution 
(96.7 g polyacrylamide, 3.3 g bisacrylamide, 200 ml lx 
TBE, 957 ml dH 2 Q) . 
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RNA isolation 

Cytoplasmic RNA was isolated from calcium 
phosphate transfected 293T cells 3 6 hours post 
transfection and from vaccinia infected Hela cells 16 
5 hours post infection essentially as described by Gilman. 
(Gilman Preparation of cytoplasmic RNA from tissue 
culture cells. In Current Protocols in Molecular 
Biology . Ausubel et al., eds., Wiley & Sons, New York, 
1992) . Briefly, cells were lysed in 400 /xl lysis buffer, 

10 nuclei were spun out, and SDS and proteinase K were added 
to 0.2% and 0.2 mg/ml respectively. The cytoplasmic 
extracts were incubated at 37 °C for 20 min, 
phenol/chloroform extracted twice, and precipitated. The 
RNA was dissolved in 100 fil buffer I and incubated at 

15 37 °C for 20 min. The reaction was stopped by adding 25 
Ail stop buffer and precipitated again. 

The following solutions were used in this 
procedure: Lysis Buffer (TRUSTEE containing with 50 mM 
Tris pH 8.0, 100 mM NaCl, 5 mM MgCl 2 , 0.5% NP40) ; Buffer 

2 0 I (TRUSTEE buffer with 10 mM MgCl 2 , 1 mM DTT, 0.5 V/pl 
placental RNAse inhibitor, 0.1 U//xl RNAse free DNAse I); 
Stop buffer (50 mM EDTA 1.5 M NaOAc 1.0 % SDS). 
Slot blot analysis 

For slot blot analysis 10 /xg of cytoplasmic RNA 

25 was dissolved in 50 /xl dH 2 0 to which 150 fil of lOx 

SSC/18% formaldehyde were added. The solubilized RNA was 
then incubated at 65 °C for 15 min and spotted onto with a 
slot blot apparatus. Radioactlvely labeled probes of 1.5 
kb gp!20IIIb and syngpl20mri fragments were used for 

30 hybridization. Each of the two fragments was random 
labeled in a 50 /xl reaction with 10 /xl of 5x oligo- 
labeling buffer, 8 /il of 2.5 mg/ml BSA, 4 /xl of <*[ 32 P]- 
dCTP (20 uCi/Ail; 6000 Ci/mmol) , and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37 °C 100 fil 

35 of TRUSTEE were added and unincorporated <*[ 32 P]-dCTP was 
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eliminated using G50 spin column. Activity was measured 
in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
5 at 42 °C with 0.5 x 10 6 cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 
washing buffer I at room temperature, for one hour in 
washing buffer II at 65 °C, and then exposed to x-ray 
film. Similar results were obtained using a l.l kb 

10 Notl/Sfil fragment of pCDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 
with a random-labeled human beta-actin probe. RNA 
expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 

15 phosphor imager . 

The following solutions were used in this 
procedure: 

5x Oligo-labeling buffer (250 mM Tris HC1, pH 8.0, 25 mM 
MgCl 2 , 5 mM 0-mercaptoethanol, 2 mM dATP, 2 mM dGTP, mM 

20 dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides [dNTP]6); 

Hybridization Solution ( M sodium phosphate, 250 mM 

NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% 
formamide, 100 fxg/ml denatured salmon sperm DNA) ; Washing 
buffer I (2x SSC, 

25 0.1% SDS); Washing buffer II (0.5x SSC, 0.1 % SDS); 20x 
SSC (3 M NaCl, 0.3 M Na 3 citrate, pH adjusted to 7.0). 

yaccinic* recombination . 

Vaccinia recombination used a modification of the 
of the method described by Romeo and Seed (Romeo and 

30 Seed, Cell. 64: 1037, 1991). Briefly, CV1 cells at 70 to 
90% confluency were infected with 1 to 3 /il of a wild- 
type vaccinia stock WR (2 x 10 8 pfu/ml) for 1 hour in 
culture medium without calf serum. After 24 hours, the 
cells were transfected by calcium phosphate with 25 nq 

3 5 t TKG plasmid DNA per dish. After an additional 24 to 48 
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hours the cells were scraped off the plate, spun down, 
and resuspended in a volume of 1 ml. After 3 
freeze/ thaw cycles trypsin was added to 0.05 mg/ml and 
lysates were incubated for 2 0 min. A dilution series of 
5 10 r 1 and 0.1 £il of this lysate was used to infect small 
dishes (6 cm) of CV1 cells, that had been pretreated with 
12.5 tig/ml mycophenolic acid, 0.2 5 mg/ml xanthin and 1.36 
mg/ml hypoxanthine for 6 hours. Infected cells were 
cultured for 2 to 3 days, and subsequently stained with 

10 the monoclonal antibody NEA9301 against gpl20 and an 
alkaline phosphatase conjugated secondary antibody. 
Cells were incubated with 0.33 mg/ml NBT and 0.16 mg/ml 
BCIP in AP-buffer and finally overlaid with 1% agarose in 
PBS. Positive plagues were picked and resuspended in 

15 100 pi Tris pH 9.0. The plague purification was repeated 
once. To produce high titer stocks the infection was 
slowly scaled up. Finally, one large plate of Hela cells 
was infected with half of the virus of the previous 
round. Infected cells were detached in 3 ml of PBS, 

20 lysed with a Dounce homogenizer and cleared from larger 
debris by centrif ugation. VPE-8 recombinant vaccinia 
stocks were kindly provided by the AIDS repository, 
Rockville, MD, and express HIV-1 IIIB gpl20 under the 7.5 
mixed early/late promoter (Earl et al., J. Virol . , 65:31, 

25 1991) . In all experiments with recombinant vaccina cells 
were infected at a multiplicity of infection of at least 
10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HCl, pH 9.5, 100 mM NaCl, 5 mM 
30 MgCl 2 ) 

Cell culture 

The monkey kidney carcinoma cell lines CV1 and 
Cos7, the human kidney carcinonfa cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
35 the American Tissue Typing Collection and were maintained 
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in supplemented IMDM. They were kept on 10 cm tissue 
culture plates and typically split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure : 

5 Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 
min 56°C, 0.3 mg/ml L-glutamine, 25 fig/ml gentamycin 0.5 
mM p-mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 
ml)). 

10 Transfect ion 

Calcium phosphate transfection of 293T cells was 
performed by slowly adding and under vortexing 10 fig 
plasmid DNA in 250 *tl 0.25 M CaCl 2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
15 to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 
cotransfection experiments with rev, cells were 
transfected with 10 jig gpl20liib, gpl20IIlbrre, 
syngpl20mnrre or rTHY-lenveglrre and 10 ng of pCMVrev or 
20 CDM7 plasmid DNA. 

The following solutions were used in this 
procedure: 2x HEBS buffer (280 mM NaCl, 10 mM KC1, 1.5 mM 
sterile filtered); 0.25 mM CaCl 2 (autoclaved) . 
ImmunoDre cipitat i on 
25 After 48 to 60 hours medium was exchanged and 

cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 nCi of 35 S-translabel . 
Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease 
3 0 inhibitors leupeptin, aprotinin and PMSF to 2.5 ng/ml, 50 
Mg/ml, 100 Mg/ml respectively, l ml of supernatant was 
incubated with either 10 fil of packed protein A sepharose 
alone (rTHY-lenveglrre) or with protein A sepharose and 3 
ng of a purified CD4/ immunoglobulin fusion protein 
.(kindly provided by Behring) (all gpi20 constructs) at 



35 
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4°C for 12 hours on a rotator. Subsequently the protein 
A beads were washed 5 times for 5 to 15 min each time. 
After the final wash 10 /xl of loading buffer containing 
was added, samples were boiled for 3 min and applied on 
5 7% (all gpl20 constructs) or 10% (rTHY-lenveglrre) SDS 
polyacrylamide gels (TRIS pH 8.8 buffer in the resolving, 
TRIS pH 6.8 buffer in the stacking gel, TRIS-glycin 
running buffer, Maniatis et al. 1989). Gels were fixed 
in 10% acetic acid and 10 % methanol, incubated with 

10 Amplify for 20 min, dried and exposed for 12 hours. 

The following buffers and solutions were used in 
this procedure: Wash buffer (100 mM Tris, pH 7.5, 150 mM 
NaCl, 5 mM CaCl 2 , 1% NP-40) ; 5x Running Buffer (125 mM 
Tris, 1.25 M Glycin, 0.5% SDS); Loading buffer (10 % 

15 glycerol, 4% SDS, 4% 0-mercaptoethanol, 0.02 % bromphenol 
blue) . 

Immunof luorescence 

2 93T cells were transfected by calcium phosphate 
coprecipitation and analyzed for surface THY-1 expression 

20 after 3 days. After detachment with 1 mM EDTA/PBS, cells 
were stained with the monoclonal antibody OX-7 in a 
dilution of 1:250 at 4°C for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC- 
conjugated goat anti-mouse immunoglobulin antiserum. 

25 Cells were washed again, resuspended in 0.5 ml of a 
fixing solution, and analyzed on a EPICS XL 
cytof luorometer (Coulter) . 

The following solutions were used in this 
procedure : 

30 PBS (137 mM NaCl, 2 . 7 mM KC1, 4 . 3 mM Na 2 HP0 4 , 1.4 mM 
KH 2 P0 4 , pH adjusted to 7.4); Fixing solution (2% 
f ormaldehyde in PBS) . 
ELISA 

The concentration of gpl20 in culture supernatant s 
3 5 was determined using CD4 -coated ELISA plates and goat 
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anti-gpl2 0 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3 000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4°C on the 
5 plates. After 6 washes with PBS 100 /xl of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

10 incubated for 3 0 min with 100 j*l of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 fil of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

15 gpl20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) . 
Green Fluorescent Protein 

20 The efficacy of codon replacement for gpl20 

suggests that replacing non-preferred codons with less 
preferred codons or preferred codons (and replacing less 
preferred codons with preferred codons) will increase 
expression in mammalian cells of other proteins, e.g., 

25 other eukaryotic proteins. 

The green fluorescent protein (GFP) of the 
jellyfish Aequorea victoria (Ward, Photochem. Photobio^ 
4:1, 1979; Prasher et al., Gene 111:229, 1992; Cody et 
al., Biochem. 32:1212, 1993) has attracted attention 

3 0 recently for its possible utility as a marker or reporter 
for transfection and lineage studies (Chalfie et al., 
Science 263:802, 1994). 

Examination of a codon usage table constructed 
from the native coding sequence of GFP showed that the 

35 ,GFP codons favored either A or U in the third position. 
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The bias in this case favors A less than does the bias of 
gpl20, but is substantial. A synthetic gene was created 
in which the natural GFP sequence was re-engineered in 
much the same manner as for gpl20. In addition, the 
5 translation initiation sequence of GFP was replaced with 
sequences corresponding to the translational initiation 
consensus. The expression of the resulting protein was 
contrasted with that of the wild type sequence, similarly 
engineered to bear an optimized translational initiation 

10 consensus (FIG. 10, panel B and FIG. 10, panel C) . In 
addition, the effect of inclusion of the mutation Ser 
65->Thr, reported to improve excitation efficiency of GFP 
at 490 nm and hence preferred for fluorescence microscopy 
(Heim et al., Nature 373:663,1995), was examined (FIG. 

15 10, panel D) . Codon engineering conferred a significant 
increase in expression efficiency (an concomitant 
percentage of cells apparently positive for 
transf ection) , and the combination of the Ser 65-*Thr 
mutation and codon optimization resulted in a DNA segment 

20 encoding a highly visible mammalian marker protein (FIG. 
10, panel D) . 

The above-described synthetic green fluorescent 
protein coding sequence was assembled in a similar manner 
as for gpl2 0 from six fragments of approximately 12 0 bp 

25 each, using a strategy for assembly that relied on the 
ability of the restriction enzymes Bsal and Bbsl to 
cleave outside of their recognition sequence. Long 
oligonucleotides were synthesized which contained 
portions of the coding sequence for GFP embedded in 

3 0 flanking sequences encoding EcoRI and Bsal at one end, 
and BamHI and Bbsl at the other end. Thus, each 
oligonucleotide has the configuration EcoRI / Bsal / GFP 
fragment /Bbsl /BamHI. The restriction site ends generated 
by the Bsal and Bbsl sites were designed to yield 

35 compatible ends that could be used to join adjacent GFP 
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fragments. Each of the compatible ends were designed to 
be unique and non-self complementary . The crude synthetic 
DNA segments were amplified by PCR, inserted between 
EcoRI and BamHI in pUC9 , and sequenced. Subsequently the 
5 intact coding sequence was assembled in a six fragment 
ligation, using insert fragments prepared with Bsal and 
Bbsl. Two of six plasmids resulting from the ligation 
bore an insert of correct size, and one contained the 
desired full length sequence. Mutation of Ser65 to Thr 
10 was accomplished by standard PCR based mutagenesis, using 
a primer that overlapped a unique BssSI site in the 
synthetic GFP. 

Codon optimization as a str ategy for improved expression 
in mammalian cells 

15 The data presented here suggest that coding 

sequence re-engineering may have general utility for the 
improvement of expression of mammalian and non-mammalian 
eukaryotic genes in mammalian cells. The results 
obtained here with three unrelated proteins: HIV gpl2 0, 

20 the rat cell surface antigen Thy-l and green fluorescent 
protein from Aeguorea victoria, suggest that codon 
optimization may prove to be a fruitful strategy for 
improving the expression in mammalian cells of a wide 
variety of eukaryotic genes. 

25 Use 

The synthetic genes of the invention are useful 
for expressing the a protein normally expressed in 
mammalian cells in cell culture (e.g. for commercial 
production of human proteins such as hGH, TPA, Factor 
30 VII, and Factor IX). The synthetic genes of the 
invention are also useful for gene therapy. 

Synthetic GFP genes can be used in any application 
in which a native GFP gene or other reporter gene can be 
tused. A synthetic GFP gene which employs more preferred 
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codons than the native GFP gene can be the basis of a 
highly sensitive reporter system. Such a system can be 
used, e.g., to analyze the influence of particular 
promoter elements or trans-acting factors on gene 
5 expression. Thus, the synthetic GFP gene can be used in 
much the same fashion as other reporters, e.g., 0- 
galactosidase, has been used. 
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(ii) TITLE OF INVENTION: HIGH LEVEL EXPRESSION OF PROTEINS 
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(C) REFERENCE /DOCKET NUMBER: 00786/294001 
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(A) LENGTH: 24 base pairs 
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(C) STRANDEDNESS : single 
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CGCGGGCTAG CCACCGAGAA GCTG 
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(2) INFORMATION FOR SEQ ID NO: 2: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ACCGAGAAGC TGTGGGTGAC CGTGTACTAC GGCGTGCCCG TG TG G AAG AG AGGCCACCAC 60 

CACCCTGTTC TGCGCCAGCG ACGCCAAGGC GTACGACACC GAGGTGCACA ACGTGTGGGC 120 

CACCCAGGCG TGCGTGCCCA CCGACCCCAA CCCCCAGGAG GTGGAGCTCG TGAACGTGAC 180 

CGAGAACTTC AACATG 196 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCACCATGTT GTTCTTCCAC ATGTTGAAGT TCTC 34 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GACCGAGAAC TTCAACATGT GG AAG AACAA CAT 33 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGGAAGAACA ACATGGTGGA GCAGATGCAT GAGGACATCA TCAGCCTGTG GGACCAGAGC 
CTGAAGCCCT GCGTGAAGCT GACCCCCTGT GCGTGACCTG AACTGCACCG ACCTGAGGAA 
CACCACCAAC ACCAACACAG CACCGCCAAC AACAACAGCA ACAGCGAGGG CACCATCAAG 
GGCGGCGAGA TG 

<2> INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GTTGAAGCTG CAGTTCTTCA TCTCGCCGCC CTT 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 7: 
GAAGAACTGC AG CTTCAACA TCACCACCAG C 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AACATCACCA CCAGCATCCG CGACAAGATG CAGAAGGAGT ACGCCCTGCT GTACAAGCTG 60 

GATATCGTGA GCATCGACAA CGACAGCACC AGCTACCGCC TGATCTCCTG CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCGGCT TCGCC 1Q c 
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(2) INFORMATION FOR SEQ ID NO: 9: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GAACTTCTTG TCGGCGGCGA AGCCGGCGGG 30 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: B ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GCGCCCCCGC CGGCTTCGCC ATCCTGAAGT GCAACGACAA GAAGTTC 47 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GCCGACAAGA AGTTCAGCGG CAAGGGCAGC TGCAAGAACG TGAGCACCGT GCAGTGCACC 
CACGGCATCC GGCCGGTGGT GAGCACCCAG CTCCTGCTGA ACGGCAGCCT GGCCGAGGAG 
GAGGTGGTGA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT CGTGCACCTG 
AATGAGAGCG TGCAGATC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



60 
120 
180 
198 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGTTGGGACG CGTGCAGTTG ATCTGCACGC TCTC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAGAGCGTGC AGATCAACTG CACGCGTCCC 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 i near 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCGCATCC ACATCGGCCC CGGGCGCGCC 60 
TTCTACACCA CCAAGAACAT CATCGGCACC ATCCTCCAGG CCCACTGCAA CATCTCTAGA 120 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTCGTTCCAC TTGG CTCT AG AGATGTTGCA 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
* (D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAACATCTC TAGAGCCAAG TGGAACGAC 29 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

GCCAAGTGGA ACGACACCCT GCGCCAGATC GTGAGCAAGC TGAAGGAGCA GTTCAAGAAC 60 

AAGACCATCG TGTTCACCAG AGCAGCGGCG GCGACCCCGA GATCGTGATG CACAGCTTCA 120 

ACTGCGGCGG C 131 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GCAGTAGAAG AATTCGCCGC CGCAGTTGA 29 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TCAACTGCGG CGGCGAATTC TTCTACTGC 29 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
t (A) LENGTH: 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGCGAATTCT TCTACTGCAA CACCAGCCCC CTGTTCAACA GCACCTGGAA CGGCAACAAC 60 
ACCTGGAACA ACACCACCGG CAGCAACAAC AATATTACCC TCCAGTGCAA GATCAAGCAG 120 
ATCATCAACA TGTGGCAGGA GGTGGGCAAG GCCATGTACG CCCCCCCCAT CGAGGGCCAG 180 
ATCCGGTGCA GCAGC 

195 

<2) INFORMATION FOR SEQ ID NO: 21: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GCAGACCGGT GATGTTGCTG CTG CACCGG A TCTGGCCCTC 40 

(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGAGGGCCAG ATCCGGTGCA GCAGCAACAT CACCGGTCTG 40 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



AACATCACCG GTCTGCTGCT GCTGCTGACC CGGACGGCGG CAAGGACACC GACACCAACG 



60 



ACACCGAAAT CTTCCGCGAC GGCGGCAAGG ACACCAACGA CACCGAAATC TTCCGCCCCG 



120 



GCGGCGGCGA CATGCGCGAC AACTGG AG AT CTGAGCTGTA CAAGTACAAG GTGGTGACGA 



180 



TCGAGCCCCT GGGCGTGGCC CCCACCAAGG CCAAGCGCGC GGTGGTGCAG CGCGAGAAGC 



240 



GC 



242 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CGCGGGCGGC CGCTTTAGCG CTTCTCGCGC TGCACCAC 38 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CGCGGGGGAT CCAAGCTTAC CATGATTCCA GTAATAAGT 39 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAG TAGAGGACAA 60 

AGAGTAATAA GTTTAACAGC ATCTTTAGTA AATCAAAATT TGAGATTAGA TTGTAGACAT 120 

GAAAATAATA CAAATTTGCC AATACAACAT GAATTTTCAT TAACG 165 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 36 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CGCGGGGAAT TCACGCGTTA ATGAAAATTC ATGTTG 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
CGCGGATCCA CGCGTGAAAA AAAAAAACAT 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 149 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: B ingle 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CGTGAAAAAA AAAAACATGT ATTAAGTGGA ACATTAGGAG TACCAGAACA TACATATAGA 
AGTAGAGTAA TTTGTTTAGT GATAGATTCA TAAAAGTATT AACATT AG C A AATTTTACAA 
CAAAAGATGA AGGAGATTAT ATGTGTGAG 



(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



nISDOCID: <WO. 



.971 1086A1_t_> 



WO 97/11086 



PCTYUS96/15088 



- 55 - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CGCGAATTCG AGCTCACACA TATAATCTCC 30 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CGCGGATCCG AGCTCAGAGT AAGTGGACAA 30 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

CTCAGAGTAA GTGGACAAAA TCCAACAAGT AGTAATAAAA CAATAAATGT AATAAGAGAT 60 

AAATTAGTAA AATGTGAGGA ATAAGTTTAT TAGTACAAAA TACAAGTTGG TTATTATTAT 120 

TATTATTAAG TTTAAGTTTT TTACAAGCAA CAGATTTTAT AAGTTTATGA 170 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CGCGAATTCG CGGCCGCTTC ATAAACTTAT AAAATC 36 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1632 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



CTCGAGATCC 


ATTGTGCTCT 


AAAGGAGATA 


CCCGGCCAGA 


CACCCTCACC 


TGCGGTGCCC 


60 


AGCTGCCCAG 


GCTGAGGCAA 


GAGAAGGCCA 


GAAACCATGC 


CCATGGGGTC 


TCTGCAACCG 


120 


CTGGCCACCT 


TGTACCTG CT 


GGGGATGCTG 


GTCGCTTCCG 


TGCTAGCCAC 


CGAGAAGCTG 


180 


TGGGTGACCG 


TGTACTACGG 


CGTGCCCGTG 


TGGAAGGAGG 


CCACCACCAC 


CCTGTTCTGC 


240 


GCCAGCGACG 


CCAAGGCGTA 


CGACACCGAG 


GTGCACAACG 


TGTGGGCCAC 


CCAGGCGTGC 


300 


GTGCCCACCG 


ACCCCAACCC 


CCAGGAGGTG 


GAGCTCGTGA 


ACGTGACCGA 


GAACTTCAAC 


360 


ATGTGGAAGA 


ACAACATGGT 


GGAGCAGATG 


CATGAGGACA 


TCATCAGCCT 


GTGGGACCAG 


420 


AGCCTGAAGC 


CCTGCGTGAA 


GCTGACCCCC 


CTGTGCGTGA 


CCCTGAACTG 


CACCGACCTG 


480 


AGGAACACCA 


CCAACACCAA 


CAACAGCACC 


GCCAACAACA 


ACAGCAACAG 


CGAGGGCACC 


540 


ATCAAGGGCG 


GCGAGATGAA 


CAACTGCAGC 


TTCAACATCA 


CCACCAGCAT 


CCGCGACAAG 


600 


ATGCAGAAGG 


AGTACGCCCT 


GCTGTACAAG 


CTGGATATCG 


TGAGCATCGA 


CAACGACAGC 


660 


ACCAGCTACC 


GCCTGATCTC 


CTGCAACACC 


AGCGTGATCA 


CCCAGGCCTG 


GCCCAAGATC 


720 


AGCTTCGAGC 


CCATCCCCAT 


CCACTACTGC 


GCCCCCGCCG 


GCTTCGCCAT 


CCTGAAGTGC 


780 


AACGACAAGA 


AGTTCAGCGG 


CAAGGGCAGC 


TGCAAGAACG 


TGAGCACCGT 


GCAGTGCACC 


840 


CACGGCATCC 


GGCCGGTGGT 


GAGCACCCAG 


CTCCTGCTGA 


ACGGCAGCCT 


GGCCGAGGAG 


900 


GAGGTGGTGA 


TCCGCAGCGA 


GAACTTCACC 


GACAACGCCA 


AGACCATCAT 


CGTGCACCTG 


960 


AATGAGAGCG 


TGCAGATCAA 


CTGCACGCGT 


CCCAACTACA 


ACAAGCGCAA 


GCGCATCCAC 


1020 


ATCGGCCCCG 


GGCGCGCCTT 


CTACACCACC 


AAGAACATCA 


TCGGCACCAT 


CCGCCAGGCC 


1080 


CACTGCAACA 


TCTCTAGAGC 


CAAGTGGAAC 


GACACCCTGC 


GCCAGATCGT 


GAGCAAGCTG 


1140 


AAGGAGCAGT 


TCAAGAACAA 


GACCATCGTG 


TTCAACCAGA 


GCAGCGGCGG 


CGACCCCGAG 


1200 


ATCGTGATGC 


ACAGCTTCAA 


CTGCGGCGGC 


GAATTCTTCT 


ACTGCAACAC 


CAGCCCCCTG 


1260 


TTCAACAGCA 


CCTGGAACGG 


CAACAACACC 


TGGAACAACA 


CCACCGGCAG 


CAACAACAAT 


1320 


ATTACCCTCC 


AGTGCAAGAT 


CAAGCAGATC 


ATCAACATGT 


GGCAGGAGGT 


GGGCAAGGCC 


1380 


ATGTACGCCC 


CCCCCATCGA 


GGGCCAGATC 


CGGTGCAGCA 


GCAACATCAC 


CGGTCTGCTG 


1440 


CTGACCCGCG 


ACGGCGGCAA 


GGACACCGAC 


ACCAACGACA 


CCGAAATCTT 


CCGCCCCGGC 


1500 


GGCGGCGACA 


TGCGCGACAA 


CTGGAGATCT 


GAGCTGTACA 


AGTACAAGGT 


GGTGACGATC 


1560 


GAGCCCCTGG 


GCGTGGCCCC 


CACCAAGGCC 


AAGCGCCGCG 


TGGTGCAGCG 


CGAGAAGCGC 


1620 


TAAAGCGGCC 


GC 










1632 
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(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2481 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



ACCGAGAAGC 


TGTGGGTGAC 


CGTGTACTAC 


GGCGTGCCCG 


TGTGGAAGGA 


GGCCACCACC 


60 


ACCCTGTTCT 


GCGCCAGCGA 


CGCCAAGGCG 


TACGACACCG 


AGGTGCACAA 


CGTGTGGGCC 


120 


ACCCAGGCGT 


GCGTGCCCAC 


CGACCCCAAC 


CCCCAGGAGG 


TGGAGCTCGT 


GAACGTGACC 


180 


GAGAACTTCA 


ACATGTGGAA 


GAACAACATG 


CTGGAGCAGA 


TGCATGAGGA 


CATCATCAGC 


240 


CTGTGGGACC 


AGAGCCTGAA 


GCCCTGCGTG 


AAGCTGACCC 


CCCTGTGCGT 


GACCCTGAAC 


300 


TGCACCGACC 


TGAGGAACAC 


CACCAACACC 


AACAACAGCA 


CCGCCAACAA 


CAACAGCAAC 


360 


AGCGAGGGCA 


CCATCAAGGG 


CGGCGAGATG 


AAGAACTGCA 


G CTTCAAC AT 


CACCACCAGC 


420 


ATCCGCGACA 


AGATGCAGAA 


GGAGTACGCC 


CTGCTGTACA 


AGCTGGATAT 


CGTGAGCATC 


480 


CACAACGACA 


GCACCAGCTA 


CCGCCTGATC 


TCCTGCAACA 


CCAGCGTGAT 


CACCCAGGCC 


540 


TGCCCCAAGA 


TCAGCTTCGA 


GCCCATCCCC 


ATCCACTACT 


GCGCCCCCGC 


CGGCTTCGCC 


600 


ATCCTGAAGT 


GCAACGACAA 


GAAGTTCAGC 


GGCAAGGGCA 


GCTGCAAGAA 


CGTGACCACC 


660 


GTGCAGTGCA 


CCCACGGCAT 


CCGGCCGGTG 


GTGAGCACCC 


AGCTCCTGCT 


GAACGGCAGC 


720 








v»,rvvjr*vrv X A 


CCGACAACGC 


CAAGACCATC 


780 


ATCGTGCACC 


TGAATGAGAG 


CGTGCAGATC 


AACTGCACGC 


GTCCCAACTA 


CAACAAGCGC 


840 


AAG CGCATCC 


ACATCGGCCC 


CGGGCGCGCC 


TTCTACACCA 


CCAAGAACAT 


CATCGGCACC 


900 


ATCCGCCAGG 


CCCACTGCAA 


CATCTCTAGA 


GCCAAGTGGA 


ACGACACCCT 


GCGCCAGATC 


960 


GTGAGCAAGC 


TGAAGGAGCA 


GTTCAAGAAC 


AAGACCATCG 


TGTTCAACCA 


GAGCAGCGGC 


1020 


GGCGACCCCG 


AGATCGTGAT 


G CACAG CTTC 


AACTGCGGCG 


GCGAATTCTT 


CTACTGCAAC 


1080 


ACCAGCCCCC 


TGTTCAACAG 


CACCTGGAAC 


GGCAACAACA 


CCTGGAACAA 


CACCACCGGC 


1140 


AGCAACAACA 


ATATTACCCT 


CCAGTGCAAG 


ATCAAGCAGA 


TCATCAACAT 


GTGGCAGGAG 


1200 


GTGGGCAAGG 


CCATGTACGC 


CCCCCCCATC 


GAGGGCCAGA 


TCCGGTGCAG 


CAGCAACATC 


1260 


ACCGGTCTGC 


TGCTGACCCG 


CGACGGCGGC 


AAGGACACCG 


ACACCAACGA 


CACCGAAATC 


1320 


TTCCGCCCCG 


GCGGCGGCGA 


CATGCGCGAC 


AACTGGAGAT 


CTGAGCTGTA 


CAAGTACAAG 


1380 


GTGGTGACGA 


TCGAGCCCCT 


GGGCGTGGCC 


CCCACCAAGG 


CCAAGCGCCG 


CGTGGTGCAG 


1440 


CGCGAGAAGC 


GGG CCGCCAT 


CGGCGCCCTG 


TTCCTGGGCT 


TCCTGGGGGC 


GGCGGGCAGC 


1500 
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ACCATGCGGG 


CCGCCAGCGT 


GACCCTGACC 




uv-L-rGCTCCT 


GAGCGGCATC 


1560 


GTGCAGCAGC 


AGAACAACCT 


CCTCCGCGCC 


ATCGAGGCCP 




CjCTCCAGCTC 


1620 


ACCGTGTGGG 


GCATCAAGCA 


GCTCCAGGCC 


CG CG TG CTGG 




1- 1 ACCTGAAG 


1680 


GACCAGCAGC 


TCCTGGGCTT 


CTGGGGCTGC 


TCCGGCAAGC 


TGRTPTPP1VP 


U A C CACGGTA 


1740 


CCCTGGAACG 


CCTCCTGGAG 


CAACAAGAGC 


CTGGACGAPA 




CATGACCTGG 


1800 


ATGCAGTGGG 


AGCGCGAGAT 


CGATAACTAC 


ArrAGPfTr: & 

nv»v*>4T\w^*.v^ X un 


A *-> 1 nUAVjUL 1 


GCTGGAGAAG 


1860 


AGCCAGACCC 


AGCAGGAGAA 




C 21 *~* OTP i^*T»/^/* , 


AG CTGG AC AA 


CTGGGCGAGC 


1920 


CTGTGGAACT 


GGTTCGACAT 


CACCAACTCC 




TCAAAATCTT 


CATCATGATT 


1980 


GTGGGCGGCC 


TGGTGGGCCT 




x lUULCblOU 


TGAGCATCGT 


GAACCGCGTG 


2040 


CGCCAGGGCT 


ACAGCCCCCT 


gar nr**rrT*An 


A C*C*C*t^ f~* r"*f^i~*r* 


CCGTGCCGCG 


CGGGCCCGAC 


2100 


CGCCCCGAGG 


GCATCGAGGA 


fifi agar r»f2f2 r* 


v» Abtd CG ACC 


GCGACACCAG 


CGGCAGGCTC 


2160 






CATCTGGGTC 


GACCTCCGCA 


GCCTGTTCCT 


GTTCAGCTAC 


2220 


CACCACCGCG 


ACCTGCTGCT 


GATCGCCGCC 


CGCATCGTGG 


AACTCCTAGG 


CCGCCGCGGC 


2280 


TGGGAGGTGC 


TGAAGTACTG 


GTGGAACCTC 


CTCCAGTATT 


GGAGCCAGGA 


GCTGAAGTCC 


2340 


AGCGCCGTGA 


GCCTGCTGAA 


CGCCACCGCC 


ATCGCCGTGG 


CCGAGGGCAC 


CGACCGCGTG 


2400 


ATCGAGGTGC 


TCCAGAGGGC 


CGGGAGGGCG 


ATCCTGCACA 


TCCCCACCCG 


CATCCGCCAG 


2460 


GGGCTCGAGA 


GGGCGCTGCT 


G 








2481 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 486 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAG TAGAGGACAA 60 

AGAGTAATAA GTTTAACAGC ATGTTTAGTA AATCAAAATT TGAGATTAGA TTGTAGACAT 120 

GAAAATAATA CACCTTTGCC AATACAACAT GAATTTTCAT TAACGCGTGA AAAAAAAAAA 180 

CATGTATTAA GTGGAACATT AGGAGTACCA GAACATACAT ATAGAAGTAG AGTAAATTTG 240 

TTTAGTGATA GATTCATAAA AGTATTAACA TTAGCAAATT TTACAACAAA AGATGAAGGA 300 

GATTATATGT GTGAGCTCAG AGTAAGTGGA CAAAATCCAA CAAGTAGTAA TAAAACAATA 360 

AATGTAATAA GAGATAAATT AGTAAAATGT GGAGGAATAA GTTTATTAGT ACAAAATACA 420 

AGTTGGTTAT TATTATTATT ATTAAGTTTA AGTTTTTTAC AAGCAACAGA TTTTATAAGT 480 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



ATGAACCCAG 


TCATCAGCAT 


CACTCTCCTG 


CTTTCAGTCT 


TGCAGATGTC 


CCGAGGACAG 


60 


AGGGTGATCA 


GCCTGACAGC 


CTGCCTGGTG 


AACAGAACCT 


TCGACTGGAC 


TGCCGTCATG 


120 


AGAATAACAC 


CAACTTGCCC 


ATCCAGCATG 


AGTTCAGCCT 


GACCCGAGAG 


AAGAAGAAGC 


180 


ACGTGCTGTC 


AGGCACCCTG 


GGGGTTCCCG 


AGCACACTTA 


CCGCTCCCGC 


GTCAACCTTT 


240 


TCAGTGACCG 


CTTTATCAAG 


GTCCTTACTC 


TAGCCAACTT 


GACCACCAAG 


GATGAGGGCG 


300 


ACTACATG TG 


TGAACTTCGA 


GTCTCGGGCC 


AGAATCCCAC 


AAGCTCCAAT 


AAAACTATCA 


360 


ATGTGATCAG 


AGACAAGCTG 


GTCAAGTGTG 


GTGGCATAAG 


CCTGCTGGTT 


CAAAACACTT 


420 


CCTGGCTGCT 


GCTGCTCCTG 


CTTTCCCTCT 


CCTTCCTCCA 


AGCCACGGAC 


TTCATTTCTC 


480 


TGTGA 












485 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CGCGGGGCTA GCGCAAAGAG TAATAAGTTT AAC 33 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CGCGGATCCC TTGTATTTTG TACTAATA 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 762 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



GAATTCACGC 


GTAAGCTTGC 


CGCCACCATG 


GTGAGCAAGG 


GCGAGGAGCT 


GTTCACCGGG 


60 


GTGGTGCCCA 


TCCTGGTCGA 


GCTGGACGGC 


GACGTGAACG 


GCCACAAGTT 


CAGCGTGTCC 


120 


GGCGAGGGCG 


AGGGCGATGC 


CACCTACGGC 


AAGCTGACCC 


TGAAGTTCAT 


CTGCACCACC 


180 


GGCAAGCTGC 


CCGTGCCCTG 


GCCCACCCTC 


GTGACCACCT 


TCAGCTACGG 


CGTGCAGTGC 


240 


TTCAGCCGCT 


ACCCCGACCA 


CATGAAGCAG 


CACGACTTCT 


TCAAGTCCGC 


CATGCCCGAA 


300 


GGCTACGTCC 


AGGAGCGCAC 


CATCTTCTTC 


AAGGACGACG 


GCAACTACAA 


GACCCGCGCC 


360 


GAGGTGAAGT 


TCGAGGGCGA 


CACCCTGGTG 


AACCG CATCG 


AG CTG AAGGG 


CATCGACTTC 


420 


AAGGAGGACG 


GCAACATCCT 


GGGGCACAAG 


CTGGAGTACA 


ACTACAACAG 


CCACAACGTC 


480 


TATATCATGG 


CCGACAAGCA 


GAAGAACGGC 


ATCAAGGTGA 


ACTTCAAGAT 


CCGCCACAAC 


540 


ATCGAGGACG 


GCAGCGTGCA 


GCTCGCCGAC 


CACTACCAGC 


AGAACACCCC 


CATCGGCGAC 


600 


GGCCCCGTGC 


TGCTGCCCGA 


CAACCACTAC 


CTGAGCACCC 


AGTCCGCCCT 


GAGCAAAGAC 


660 


CCCAACGAGA 


AGCGCGATCA 


CATGGTCCTG 


CTGGAGTTCG 


TGACCGCCGC 


CGGGATCACT 


720 


CACGGCATGG 


ACGAGCTGTA 


CAAGTAAAGC 


GGCCGCGGAT 


CC 




762 
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What is claimed is: 

1. A synthetic gene encoding a protein normally 
expressed in a eukaryotic cell wherein at least one non- 
preferred or less preferred codon in the natural gene 

5 encoding said protein has been replaced by a preferred 
codon encoding the same amino acid. 

2. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said eukaryotic 
protein at a level which is at least 110% of that 

10 expressed by said natural gene in an in vitro mammalian 
cell culture system under identical conditions. 

3 . The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said eukaryotic 
protein at a level which is at least 150% of that 

15 expressed by said natural gene in an in vitro cell 
culture system under identical conditions. 

4 . The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said eukaryotic 
protein at a level which is at least 200% of that 

20 expressed by said natural gene in an An vitro cell 
culture system under identical conditions. 

5. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said eukaryotic 
protein at a level which is at least 500% of that 

25 expressed by said natural gene in an An vitro cell 
culture system under identical conditions. 

6. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said eukaryotic 
protein at a level which is at least ten times that 
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expressed by said natural gene in an in vitro cell 
culture system under identical conditions. 

7. The synthetic gene of claim 1 wherein at least 
10% of the codons in said natural gene are non-preferred 
5 codons • 



8. The synthetic gene of claim 8 wherein at least 
50% of the codons in said natural gene are non-preferred 
codons, 

9. The synthetic gene of claim 1 wherein at least 
10 50% of the non-preferred codons and less preferred codons 

present in said natural gene have been replaced by 
preferred codons. 



10. The synthetic gene of claim l wherein at 
least 90% of the non-preferred codons and less preferred 
15 codons present in said natural gene have been replaced by 
preferred codons. 



11. The synthetic gene of claim 1 wherein said 
protein is green fluorescent protein. 

12. A method for preparing a synthetic gene 
20 encoding a protein normally expressed by eukaryotic 

cells, comprising identifying non-preferred and less- 
preferred codons in the natural gene encoding said 
protein and replacing one or more of said non-preferred 
and less-preferred codons with a preferred codon encoding 
25 the same amino acid as the replaced codon. 
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Syn<j pi 20jmn 

1 CTCGAGATCC A77G7CC7C7 

51 7GCGG7GCCC AGC7CCCCAG 

101 CCATGGGGTC 7C7GCAACCG 

151 GTCGCTTCCG TGC7ACCCAC 

201 CG7GCCCG7G TGGAAGGAGG 

2 51 CCAAGGCGTA CGACACCCAG 

3 01 G7GCCCACCG ACCCCAACCC 

3 SI GAAC77CAAC ATGTGGAAGA 

4 01 7CA7CA.GCC7 G7GGGACCAG 

4 51 CTG7GCGTGA C "7GAAC7G 
5C1 CAACAGCACC C^CCAACAACA 

5 51 GCGAGA7GAA CAAC7CCACC 
601 ATGCAGAAGG- A37ACGCCC7 

6 51 CAACGACAGC AGCAGC7ACC 
7C1 CCCAGGCC7G QCCCAACATC 
751 GCCCCCGCCG qC7TCCCCAT 
301 CAAGCGCACC TGCAAGAACG 
S51 GGCCGG7GG7 (^AGCACCCAG 
9C1 GAGG7GG7GA TCCGCAGCGA 
9 51 CG7GCACC7G AA7GACAGCG 

1001 ACAAGCGCAA QCGCA7CCAC 

1C51 AAGAACA7CA TCGGCACCA7 

1101 CAAG7GGAAC CACACCC7GC 

1151 7CAAGAACAA G AC C A7CG7G 

1201 A7CG7GA7GC ACAGC77CAA 

12 51 CAGCCCCCTG TTCAACAGCA 

13 01 CCACCGGCAG C AAC AAC AAT 

13 51 A7CAACA7G7 CGCAGGAGG7 

14 01 GGGCCAGA7C CGG7GCAGCA 
14 51 ACGGCGGCAA CGACACCGAC 



AAAGGAGA7A CCCGGCCAGA CACCCTCACC 
GC7GAGGCAA GAGAAGGCCA GAAACCA7GC 
CTGGCCACC7 7G7ACC7GCT GGGGA7GC7G 
CGAGAAGC7G 7GGG7GACCG 7G7AC7ACGG 
CCACCACCAC CC7G77C7GC GCCACCGACG 
G7GCACAACG 7G7GGGCCAC CCAGGCGTGC 
CCAGGAGG7G GAGC7CG7GA ACG7GACCGA 
ACAACA7GG7 GGAGCAGA7G CA7GAGGACA 
AGCC7GAAGC CC7GCG7GAA GC7GACCCCC 
CACCGACCTG AGGAACACCA CCAACACCAA 
ACAGCAACAG CGACGCCACC A7CAAGGGCG 
77CAACA7CA CCACCAGCA7 CCGCGACAAG 
GC7G7ACAAG C7GGATATCG 7GAGCATCGA 
GCC7GA7C7C CTGCAACACC AGCG7GA7CA 
AGCTTCGAGC CCA7CCCCAT CCAC7AC7GC 
CCTGAAGTGC AACGACAAGA AG7TCAGCCG 
7GAGCACCG7 GCAGTGCACC CACGGCA7CC 
C7CC7GC7GA ACGGCAGCC7 GGCCGAGGAG 
GAACT7CACC GACAACGCCA AGACCA7CA7 
7GCAGATCAA C7GCACGCG7 CCCAACTACA 
A7CGGCCCCG GGCGCGCC77 C7ACACCACC 
CCGCCAGGCC CAC7GCAACA TCTC7ACAGC 
GCCAGA7CG7 GAGCAAGC7G AAGG AG C AGT 
77CAACCAGA CCAGCGGCGG CGACCCCGAG 
C7GCGGCGGC GAA77C77C7 AC7GCAACAC 
CC7GGAACGG C AAC AAC AC C 7GGAACAACA 
A7TACCC7CC AC7GCAAGA7 CAAGCAGA7C 
CGCCAAGCCC A7G7ACGCCC CCCCCA7CGA 
GCAACA7CAC CGG7C7CC7G C7CACCCCCG 
ACCAACGACA CCGAAA7C77 CCGCCCCGGC 



( sh££t ) a? q 
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1501 GGGGGCGACA 7GCGCGACAA CTCGAGATCT 
1551 GGTGACGATC G^GCCCCTGG GCGTGGCCCC 
1601 TGGTGCAGCG C'jAGAAGCGC TAAAGCGGCC 



PCT/US96/15088 

GAGCTGTACA AGTACAAGGT - - 
CACCAAGGCC AAGCGCCGCG 
GC (SEQ ID NO:34) 



FIG 1 
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1 ACCGAGAACC TCTGGGTGAC C3TGTACTAC GGCCTGCCCG TGTGGAAGGA 
=1 CCCCACCACC ACCCTCTTCT GCGCCAGCGA CGCCAAGGCG TACCACACCG 
i:i AOCTGCACAA C-3TGTGGCCC ACCCACGCGT CCCTCCCCAC CGACCCCAAC 
13. CCCCACGACC TiGACCTrGT GAACGTGAdC GAGAACTTCA ACATGTGCAA 
2d GAACAACATG CTCGACCAGA TGCATC.ACGA CATCATCAGC CT3TGGGACC 
: = . ACACCCTGAA (VCCCTGr.GTG AAGC7GACCC CCC7CTCCGT GACCCTCAAC 
301 TGCACCGAC- T'AGGAACAC CACCAACACC AAC AACAG C A CCCCdV\CAA 

3 31 CAACAP.CAAC AGCGAGGCCA CIATCAAGCC CCGCCACATG AACJAACTGCA 
• C L cr— CAACAT CACCACCACC ATCCGCGACA AG ATC C AG Tw\ CGAGTACCCC 

4 5". "GCTGTACA AGCTGGATAT CGTGAGCATC C AC AAC G AC A OCACCACCTA 

3 51 TCAGCTTCCA CCCCATCCCC AT C C AC r ACT CCCCCCCCCC CGGCTT CGCC 

60 1 ATCCTGAACT ^CAACGACAA GAAC7TCAGC GGCAAGGGCA CCTGCAAGAA 

531 CGTCACCACC :™CACTGCA CCC ACGGCAT CCCCCCGCTG C73AGCACCC 

7C1 ACCTCCTCCT ^AACCCCACC CTCCCCGAGG AGCAGGTGGT CATCf.GCAGC 

7bl GAGAACTTCA CCCACAACGC CAAGACCATC ATCGTCCACC TGAATCAGAG 

9C1 CGTCCACATC AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCC-CATCC 

951 ACATCGGCCC CGGGCCCCCC TTCTACACCA CCAAGAACAT CATC5GCACC 

901 ATCCGCCACC CCCAC7GCAA CA7CTCTAGA GCCAAGTGGA ACGACACCC7 

9 = 1 GCCCCAGATC QTGAGCAAnr TGAAGGAGCA GTTCAAGAAC AAC AC CATC C 

1 301 TCTT CAACCA GAGCAGCGGC GGCGACCCCG AGATCGTCAT CCACACCTTC 

10 51 AACTGCG"G GCGAATTC77 CTACTGCAAC ACCACCCCCC TCTTCAACAG 

110*. CACCTGGAAC QGCAACAACA CCTGGAACAA CACCACC'JGC ACC AAC AAC A 

1131 AT ATT ACC™ CCAGTGCAAG ATCAAGCACA TCATCAACAT GTGGCAGCAG 

1201 CTGGGCAAGG CCATCTACCC CCCCCCCATC CACGGCCAGA TCCGCTGCAC 

125 1 C ACC AAC ATC ACCCCTCTCC TCCTCACCCG CGACGGCGGC A AC AC CG 

Lid ACACCAACUA CACCGAAATC 77CCCCC7CG GCGGC"rr,A CATGCGCGAC 

13 SI AACTGGAGAT CTGAGCTCTA CAAG7ACAAG GTGGTGACGA TCGAGCCCCT 

LOl CCGCCTCCCC CCCACCAAGG CCAAOCGCCG CGTGGTGCAC CGCGAGAAGC 



FIG . I 
C SHerr i of ^ 



3NSDOCID:<WO 9711086A1 I > 



WO 97/11086 



4/14 



PCT/US96/15088 



14 51 CCGCCGCCA7 C*JGCGCCC7G 7TCCTCGGCT 7CCTGGCGCC CCCCGGCACC 

1501 ACCATGCGGG CCGCCAGCGT GACCCTGACC GTGCAGGCCC GCCTGCTCCT 

1551 GAGCGGCA7C GjTGCAGCAGC AGAACAACC7 CCTCCGCGCC ATCGAGGCCC 

1501 AGCAGCATAT QCTCCAGC7C ACCGTGTGGG GCA7CAAGCA GCTCCAGGCC 

1551 CCCG7GC7GG CCCTGGACCG CTACCTGAAG GACCAGCAGC TCCTGGGCT7 

17 01 C7GGCCC7GC 7CCGGCAAGC 7CA7C7CCAC CACCACGGTA CCCTGGAACG 

17 51 CCTCC7GGAG CAACAAGACC C7GGACGACA 7C7CGAACAA CATGACCTGG 

15C1 ATCCAG7CGG AGCGCGAGA7 CCA7AAC7AC ACCACCCTGA 7C7ACAGCC7 

1551 GCTGGAGAAG MCCACACCC AGCAGGAGAA GAACGAGCAG GAGCTGCTGG 

1301 ACC7GGACA-A CrCCGCCAGC C7C7GGAAC7 CGT7CG AC A7 CACCAACTGG 

19 5.1 C7C.7GG7ACA 7*7AAAATCT7 CATCATGA77 GTGGGCGCCC 7GG7GGGCCT 



21C1 CGCCCCGACG CCATCGAGGA GGAGGGCGGC CAGCGCGACC GCGACACCAG 

2151 CGGCAGGCTC Q7GCACGGC7 7CCTGGCGAT CATCTGGGTC GACC7CCGCA 

2 201 GCC7C7TCC7 ^TTCAGCTAC CACCACCGCG ACCTCCTGCT GATCGCCGCC 

22 51 CGCA7CGTCG AACTCCTAGG CCGCCGCGGC 7GGGAGG7GC TGAAGTACTG 

23 01 C-TGGAACCTC C7CCAGTA77 GGAGCCAGGA GC7GAAGTCC AGCGCCGTGA 

23 51 GCC7GCTGAA CGCCACCGCC A7CGCCG7CG CCGAGGGCAC CGACCGCG7G 
2 401 A7CGACG7CC TCCAGAGGC-C CGGGAGGGCG A7CC7GCACA 7CCCCACCCG 

24 51 CATCCGCCAG » :GGCTCGAGA GGGCGC7GC7 G (SEQ ID NO: 35) 
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JL 


CAATTCACGC 


GTAAGCTTGC 


CGCCACCATG 


GTGAGCAAGG 


GCGAGGAGCT 


R 1 




GTGGTGCCCA 


TCCTGGTCGA 


GCTGGACGGC 


GACGTGAACG 


1U JL 


ijrV^V^/\V_^rt.Vj X X 


CAGCGTGTCC 


GGCGAGGGCG 


AGGGGGATGC 


CACCTACGGC 


ID X 


AAbC X <»J./\w w w 


TGAAGTTCAT 

X w/^^WJ X X >— >** X 


CTGCACCACC 


GGCAAGCTGC 


CCGTGCCCTG 


201 


LarL-L. w>\*— ^- W X w 


X w/%.W W.** V»\a X 


TCAGCTACGG 

X VAw W X n\mV9w 


CGTGCAGTGC 


TTCAGCCGCT 








CACGACTTCT 

V-AU'JAW X X w X 


TCAAGTCCGC 


CATGCCCGAA 


301 


GGCTACG 1 


A bAvj ww w A w 


WA X w X X w X X w 


AAGGAPGACG 


GCAACTACAA 


3 51 


GALCLGCbLL 




X L.«AUOOCtjn 


W^\W w W X VJVJ X w 


AACCGCATCG 


4 U 1 




PATCGACTTC 


AAGGAGGACG 


GCAACATCCT 


GGGGCACAAG 


451 


CTGGAGTACA 


ACTACAACAG 


CCACAACGTC 


TATATCATGG 


CCGACAAGCA 


501 


GAAGAACGGC 


ATCAAGGTGA 


ACTTCAAGAT 


CCGCCACAAC 


ATCGAGGACG 


551 


GCAGCGTGCA 


GCTCGCCGAC 


CACTACCAGC 


AGAACACCCC 


CATCGGCGAC 


601 


GGCCCCGTGC 


TGCTGCCCGA 


CAACCACTAC 


CTGAGCACCC 


AGTCCGCCCT 


651 


GAG C AAAG AC 


CCCAACGAGA 


AG CG CG AT C A 


CATGGTCCTG 


CTGGAGTTCG 


701 


TGACCGCCGC 


CGGGATCACT 


CACGGCATGG 


ACGAGCTGTA 


CAAGT AAAG C 


751 


GGCCGCGGAT 


CC (SEQ ID NO: 40) 
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